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Description 

This invention relates to compilers for digital computer programs, and more particularly to a compiler framework 
that is adapted to be used with a number of different computer languages, to generate code for a number of different 
s target machines. 

Compilers are usually constructed for translating a specific source language to object code for execution on a 
specific target machine which has a specific operating system. For example, a Fortran compiler may be available for 
generating code for a computer having the VAX architecture using the ViVIS operating system, or a C compiler for a 
80386 computer executing MS/DOS. Intermediate parts of these language- and target-specific compilers share a great 
10 deal of common structure and function, however, and so construction of a new compiler can be aided by using some 
of the component parts of an existing compiler, and modifying others. Nevertheless, it has been the practice to construct 
new compilers for each combination of source language and target machine, and when new and higheri^eftormance 
computer architectures are designed the task of rewriting compilers for each of the commonly-used source languages 
is a major task. 

IS The fieW of computer-aided software engineering (CASE) Is heavily dependent upon compiler technology. CASE 

tools and programming environments are built upon core compilers. In addltbn. performance specifications of computer 
hardware are often Integrally involved with compiler technology. The speed of a processor is usually measured in high- 
level language benchmarks, so therefore optimizing compilers can influence the price-performance factor of new com- 
puter equipment. 

20 In order to facilitate constructbn of compilers for a variety of different high-level languages, and different target 

computer architectures, It is desirable to enhance the commonality of core components of the compiler framework. 
The front end of a compiler directly accesses the source code module, and so necessarily is language-specific; a 
compiler front end constructed to interpret Pascal wouki not be able to Interpret C. Likewise, the code generator in the 
back end of a compiler has to use the instructk>n set of the target computer architecture, and so is machine-specific. 

2S Thus, it is the Intermediate components of a compiler that are susceptible to being made more generic, compiler front 
end usually functk>ns to first translate the source code Into an Intermed'ate language, so that the program that was 
originally written in the high-level source language appears In a more elemental language for the intemal operations 
of the compiler. The front end usually produces a representation of the program or routine, in intermediate language, 
in the form of a soK»lled graph, akxtg with a symbol table. These two data structures, the intermediate language graph 

30 and the symbol table, are the representatksn of the program as used internally by the compiler Thus, by making the 
intermediate language and constructkxi of the symbol table of universal or generic character, the components following 
the front end can be made more generic. 

After the compiler front end has generated the intermediate language graph and symbol table, vark>us optimizing 
techniques are usually implemented. The fk>w graph is rearranged, meaning the program is rewritten, to optimize speed 

3S of execution on the target machine. Some optimizations are target-specifk:. but most are generic. Comnrxxity-used 
optimizations are code nfK>tk>n, strength reduction, etc. Next in the internal organization of a compiler is the register 
and memory allocatbn. Up to this point, data references were to variables and constants by name or In the abstract, 
without regard to where stored; now. however, data references are assigned to more concrete locations, such as 
specific registers and memory displacements (not memory addresses yet). A! this point, further optimizatk>ns are pos- 

^ sible, in the form of register allocatk>n to maintain data in registers are minimize menrK>ry references; thus the program 
may be again rearranged to optimize register usage. Register allocation is also somewhat target machine dependent, 
and so the generic nature of the compiler must accomnrxxJate specifying the number, size and special assignments 
for the register set of the target CPU. Foltowing register and menrx^ry alkx^tion, the compiler implements the code 
generatbn phase, in which object code images are produced, and these are of course in the target machine language 

^ or instruction set. i.e.. machine-specific. Subsequently, the object code images are linked to produce executable pack- 
ages, adding various runtime modules, etc., all of which is machine-specific. 

A prior art document is: Communications of the ACM, vol. 26, No. 9, New York, USA; A.S. Tanenbaum et. al, pages 
654-660. Here a common intermediate language, global optimizatkxi. peephole optimizatbn are used so that 11 source 
languages can be compiled for N target machines using 11 front ends and N back ends. 

so In a typk:al compiler implementation, it is thus seen that the structure of the Intermediate language graph, and the 
optimization and register and nrmmory alkx:ation phases, are those rTK>st susceptible to being made more generic. 
However, due to substantive differences In the high-level languages most comnrxxily used today, and differences in 
target machine architecture, obstacles exist to discourage constructbn of a generic compiler core. 

ss SUMMARY OF THE INVE^ST10N 

The invention is defined by the appended claims. 

In accordance with one embodiment of the invention, a compiler framework is provided which uses a generb "shell" 
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or cxjntrol and sequencing mechanism, and a generic back end (where the code generator is of course target-specific). 
The generic back end provides the functions of optimizatbn, register and memory allocatbn, and code generation. 
The shell may be executed on varbus host computers, and the code generation function of the back end may be 
targeted for any of a number of computer architectures. A front end is taibred for each different source language, such 
s as Cobol, Fortran, Pascal, C, C-m-, Ada, etc. The front end scans and parses the source code modules, and generates 
from them an intermediate language representation of the programs expressed in the source code. This intermediate 
language is constructed to represent any of the source code languages in a universal manner, so the interface between 
the front end and back end Is of a standard format, and need not be rewritten for each language-specific front end. 
The intermediate language representatbn generated by the front end is based upon a tuple as the elemental unit, 

10 where each tuple represents a single operation to be performed, such as a load, a store, an add, a label, a branch, 
etc. A data structure is created by the front end for each tuple, with fields for various necessary information. Along with 
the ordered series of tuples, the front end generates a symbol table for all references to variables, routines, labels, 
etc., as Is the usual practice. The tuples are in ordered sequences within blocks, where a bbck is a part of the code 
that begins with a routine or label and ends in a branch, for example, where no entry or exit is permitted between the 

IS start and finish of a block. Each block is also a data structure, or node, and contains pointers to its successors and 
predecessors (these being to symbols in the symbol table). The interlinked blocks make up a flow graph, called the 
intermediate language graph, which is the representation of the program used by the back end to do the optimizations, 
register and memory allocatbns, etc. 

One of the features of the inventbn is a mechanism for representing effects and dependencies in the interface 

20 between front end and back end. A tuple has an effect if it writes to menx)ry. and has a dependency if it reads from a 
location which some other node may write to. N^rious higher level languages have differing ways of expressing oper- 
ations, and the same sequence may in one language albw a result or dependency, while in another language it may 
not. Thus, a mechanism whbh Is independent of source language is provided for describing the effects of program 
execution. This mechanism provkJes a means for the compiler front end to generate a detailed language-specific in- 

2S formatbn to the multi-language optimizer in the compiler back end. This mechanism is used by the global optimizer to 
determine legal and effective optimizations, including common subexpressbn recognition and code nriotbns. The in- 
termediate language and structure of the tuples contain information so that the back end (optimizers) can askquestbns 
of the front end (obtain infomiatbn from the intermediate language graph), from which the back end can determine 
when the executbn of the code produced for the target machine for one tuple will affect the value computed by code 

30 for another tuple. The Interface between back end and front end is in this respect language independent. The back 
end does not need to know what language it is compiling. The advantage is that a different back end (and shell) need 
not be written for each source language, but Instead an optimizing compiler can be produced for each source language 
by merely tailoring a front end for each different language. 

An additional feature erf one embodiment of the invention is a mechanism for "folding constants" (referred to as K- 

3S folding or a KFOLD routine), included as one of the optlmizatbns. This mechanism Is for finding occurrences where 
expressions can be reduced to a constant and calculated at compile time rather than a more time^onsuming calculation 
during runtime. An important feature is that the KFOLD code is built by the compiler framework itself rather than having 
to be coded or calculated by the user. The KFOLD bulkier functbns as a front end, like the other language-specifb 
front ends, but there is no source code input; instead, the input is in intermediate language and merely consists of a 

40 listing of all of the operators and all of the data types. The advantage is that a much more thorough KFOLD package 
can be generated, at much lower cost. 

A further feature of one embodiment is the type definitbn mechanism, referred to a the TD module. This module 
provides mechanisms used by the front end and the compiler of the back end in constructing program type information 
to be incorporated in an object nrxxlule for use by a linker or debugger. The creatbn of type informatk>n' takes place 

^ in the context of symbol table creation and albws a front end to specify to the back end an abstract representation of 
program type informatbn. The TD module provides service routines that allow a front end to describe basic types and 
abstract types. 

In additbn, a feature of one embodiment is a method for doing code generation using code templates in a multipass 
manner. The selectbn and application of code templates occurs at four different times during the compilatbn process: 

so (1 ) The pattern select or PATSELECT phase does a pattem match in the CONTEXT pass to select the best code 
templates; (2) The TNASSIGN and TNLIFE tasks of the CONTEXT pass use context actions of the selected templates 
to analyze the evaluatbn order to expressions and to allocate temporary names (TNs) with lifetimes nonfocal to the 
code templates; (3) The TNBIND pass uses the binding actions of the selected templates to albcate TNs with lifetimes 
local to the code templates; (4) Finally, the CODE pass uses code generation actions of the selected templates to 

ss guide the generatkxi of object code. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The novel features believed characteristic of the invention are set forth in the appended claims. The invention 
itself, however, as well as other features and advantages thereof, will be best understood by reference to the detailed 
s description of specific embodiments which follows, when read in conjunction with the accompanying drawings, wherein: 

Figure 1 is a schematic representation of a compiler using features of the invention; 

Figure 2 is an electrical diagram in block fonm of a host computer upon which the methods of various features of 
10 the invention may be executed; 

Figure 3 is a diagrammatic representation of code to be compiled by the compiler of Figure 1 . in source code form, 
intermediate language form, tree from, and assembly language form; 

IS Figure 4 is a diagrammatic representation of the data structure of a tuble used in the compiler of Figure 1 ; 

Figure 5 is a logic flow chart of the operation of the shell of Figure 1 ; 

Figure 6 is an example listing of code containing constants; and 

20 

Figure 7 Is a diagram of data fields and relationships (pointers) for illustrating type definition according to one 
feature of the invention. 

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS 

2S 

Referring to Figure 1, the compiler framework 10 according to one embodiment of the invention is a language- 
independent framework for the creation of portable, retargetable compilers. The compiler framework 10 consists of a 
portable operating system interface referred to as the shell 11 and a retargetable optimizer and code generator 1 2 (the 
back end). The shell 11 is portable in that can be adapted to function with any of several operating systems such as 

30 VAX/VMS. Unix, etc.. executing on the host computer The shell operates under this host operating system 1 3 executing 
on a host computing system as seen in Figure 2, typically Including a CPU 14 coupled to a main memory 15 by a 
system bus 1 6, and coupled to disk storage 17 by an I/O controller 18. The shell 11 and compiler 12 may be combined 
with a front end 20 to create a portable, retargetable compiler for a particular source language. Thus, a compiler based 
on the frameworic 10 of the inventksn consists of three bask: parts: a shell 11 which has been tailored for a particular 

35 host operating system 1 4 ~ this determines the host environment o* the compiler; a front end 20 for a particular source 
language (e.g., C, C++, Pascal, Fortran, Ada, Cobol, etc.) - this determines the source language of the compiler; and 
a back end 1 2 for a particular target machine (i.e., a partbular architecture such as VAX, RISC, etc.) - this determines 
the target machine of the compiler. 

Since the interfaces between the shell 11 , the front end 20, and the back end 1 2 are fixed, the individual components 

40 of a compiler produced according to the inventton may be replaced freely. That is, the front end 20 may consist of a 
number of interchangeable front ends, e.g., one for Fortran, one for Colx>l, one for Pascal, one for C, etc. Likewise, a 
shell 11 tailored for running under VMS on a VAX computer may be replaced by a shell 11 running under the Unix 
operating system on a RISC workstatk)n, while the front end 20 and back end 12 remain the same. 

The shell 11 provides a fixed interface between the host operating system 13 and the rest of the compiler. The 

^ shell provkies several advantages according to the inventbn. First, the shell 11 provides a portable interface to basb 
features of the operating system 1 3. For example, the front end 20 need not know details of the file system, command 
parsing, or heap storage allocation under the host operating system 1 3, since all these sen^ices are accessed through 
shell routines, and the shell is taitoredtothe operating system 13 being used. Second, the shell 11 eliminates duplrcatbn 
of effort by providing a single implementatkxi of some comrrxxi compiler components, such as command line parsing, 

50 include-file processing, and diagnostic file generation. Third, the use of these common components also guarantees 
consistency among compilers created using the frameworic 10; all compilers created using this framework 10 will write 
listing files in the same format, will treat command line qualifiers the same, will issue similar-looking error messages, 
etc. Fourth, having common shell utilities in the shell 11 improves the intemal integration of the compiler, since the 
front and back ends 20 and 1 2 use the same shell f unctbns. For example, use of the shell locator package means that 

S5 source file locations can be referred to consistently in the source listing, front-end generated diagnostk:s. back-end 
generated diagnostics, the object listing,{and the debugger infomnatbn. 

The front end 20 is the only component of a compiler created by the framework 10 which understands the source 
language being compiled. This source language is that used to generate the text of a source code file or files (module 
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or modules) 21 which define the input of the compiler. The front end 20 performs a number of functions. First, It calls 
the shell 1 1 to obtain command line information and text lines from the source files 21 . Second, the front end 20 calls 
the shell 11 to control the listing file, write diagnostic messages, and possibly to write other files for specific languages. 
Third, the front end 20 does lexical, syntactic, and semantic analysis to translate the source text in file 21 to a language- 

s independent internal representation used for the interface 22 between the front end 20 and the back end 12. Fourth, 
the front end 20 Invokes the back end 1 2 to generate target system object code 23 from the informatbn in the internal 
representation. Fifth, the front end 20 provides routines which the back end 12 calls via call path 24 to obtain language- 
specific Informatkxi during back end processing. Not included in the compiler framework of Fig. 1 is a linker whfch links 
the object code modules or Images 23 to form an executable image to run on the target machine 25. 

10 The target machine 25 for whrch the back end 12 of the compiler creates code is a computer of some specific 

architecture, i.e.. it has a register set of some specific number and data width, the Uygic executes a specific Instruction 
set, specific addressing modes are available, etc. Examples are (1 ) the VAX architecture, as described In (2) a RISC 
type of architecture based upon the 32-bit RISC chip available from MIPS. Inc., as part number R2000 or R3000 and 
described by Lane in "MIPS R2000 RISC Architecture". Printfce-HaH. 1987, and (3) an advanced RISC architecture 

IS with 64-bit registers as described in copending applk:ation Serial No. 547,589, filed June 29, 1990. Vkrious other 
architectures would be likewise accommodated. 

In general, the front end 20 need not conskJer the architecture of the target machine 25 upon which the object 
code 23 will be executed, when the front end 20 is translating from source code 15 to the internal representatkxi of 
interface 22, since the internal representation Is independent of the target machine 25 architecture. Some aspects of 

20 the front end 20 may need to be tailored to the target system, however; for example, aspects of the data representation 
such as alkx:ation and alignment, might be customized to fit the target machine 25 architecture better, and routine call 
argument mechanisms may depend on the target system calling standard, and further the runtime library interface will 
probably be different for each target system. 

The back end 12 functtons to translate the internal representation 22 constructed by the front end 20 into target 

25 system object code 23. The back end 1 2 performs the basic f unctbns of optimization 26. code generation 27, storage 
and register alkx^tbn 28, and object file emission 29. The optimization f unctkxi Is performed on the code when it is 
in its intemal representatk)n. The back end 12 also Includes utility routines which are called by the front end 20 to 
create a symbol table 30 and intermediate language data structures. 

When the user (that is, a user of the computer system of Figure 2, where the computer system is executing the 

30 operating system 13) invokes the compiler of Figure 1 (though a callable interface, or some other mechanism), the 
shell 11 receives control. The shell 11 invokes the front end 20 to compile an Input stream from source file 15 into an 
object file 23. The front end 20 invokes the back end 12 to produce each object module within the object file 23. The 
front end 20 may invoke the back end 12 to create code for each individual routine within an object module 23, or It 
may call a back end driver which will generate code for an entire module at once. 

35 The front end 20 parses the source code 21 and generates an intermediate language version of the program 

expressed in the source code. The elemental structure of the intenmediate language is a tuple. A tuple is an expression 
whk:h in source language performs one operation. For example, referring to Figure 3, the expression 



as represented in source language is broken down into four tuples for representation in the intermediate language, 
these being numbered $1. $2. $3 and $4. This way of expressing the code In IL includes a first tuple $1 whk:h Is a 
fetch represented by an item 31 , with the object of the fetch being a symbol J. The next tuple is a literal, item 32, also 

45 making reference to a symbol "1 The next tuple is an Add, item 33, which makes reference to the results of tuples 
$1 and $2. The last tuple Is a store. Item 34, referencing the result of tuple $3 and placing the result in symbol I in the 
symbol table. The expression may also be expressed as a logic tree as seen in Figure 3, where the tuples are identified 
by the same reference numerals. This same line of source code couki be expressed in assembly for a RISC type target 
machine, as three instnjctlons LOAD. ADD integer, and STORE, using some register such as REG 4 in the register 

50 file, in the general form seen in Figure 3. Or, for a CISC machine, the code emitted may be merely a single Instruction, 
ADD #1 .J.I as also seen in the Figure. 

A tuple, then, is the elemental expression of a computer program, and in the form used in this invention is a data 
structure 35 which contains at least the elements set forth in Figure 4. including (1) an operator and type field 36. e. 
g., Fetch, Store, Add, etc., (2) a locator 37 for defining where in the source module 21 the source equivalent to the 

55 tuple is located. (3) operand pointers 38 to other tuples, to literal nodes or symbol nodes, such as the pointers to I and 
#1 tuples $1 and ^ in Figure 3. A tuple also has attribute fiekis 39, whtoh may include, for example. Label, Conditional 
Branch. Argument (for Calls), or SymRef (a symbol in the symbol table). The tuple has a number field 40. representing 
the order of this tuple in the bkx:k 
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The front end 20 parses the source code to identify tuples, then to identify basic blocks of code. A block of code 
is defined to be a sequence of tuples with no entry or exit between the first and last tuple. Usually a block starts with 
a label or routine entry and ends with a branch to another label. A task of the front end 20 is to parse the source code 
21 and identify the tuples and blocks, which of course requires the front end to be language specific. The tuple thus 
contains fields 41 that say whether or not this tuple is the beginning of a block, and the end of a block 

As discussed in more detail below, one feature of the invention is a method of representing effects. A tuple has 
effects if it stores or writes to a memory location (represented at the IL level as a symbol), or is dependent upon what 
another tuple writes to a locatfon. Thus, in the example given in Figure 3. tuple $4 has an effect (store to I) and tuple 
$1 has a dependency (content of J). Thus the tuple data stmcture as represented in Figure 4 has fields 42 and 43 to 
store the effects and dependencies of this tuple. 

A single execution of a compiler of Figure 1 is driven by the shell 11 as illustrated in the flow chart of Figure 5, As 
indicated by the item 45 of Figure 5, the shell 11 receives control when the compiler of Figure 1 is invoked by the user 
via the operating system 1 3, The user in a command line specifies a "plus-list" or list of the modules 21 to be operated 
upon. The next step is calling by the shell 11 of a front-end routine GEM$XXJNIT whfch does any necessary initiali- 
zation for the front end, indicated by the item 46. This front end routine GEM$XX_INIT is described in the Appendix. 
Next, the shell 11 parses the gtobal command qualifiers and calls the front end routine 
GEM$XX_PROCESS_GLOBALS, as indicated by the item 47. Then, for each "plus-lisf (comma-separated entity) in 
the command line used at the operating system 1 3 level to involve the compiler, the shell executes a series of actkxis; 
this is implemented by a kx)p using a decision point 48 to check the plus-list. So long as there is an item left in the 
plus-list, the actions indicated by the items 49-52 are executed. These actions include accessing the source files 21 
specified in the command line and creating an input stream for them, indicated by the item 49, then parsing the local 
qualifiers (specific to that plus-list), calling GEM$XX_PROCESS_LOC ALS to do any front-end determined processing 
on them, and opening the output files specified by the qualifiers. indk:ated by the item 50. The actkms in the kx>p further 
include calling the front-end routine GEM$XX_CX)MPILE to compile the Input stream, indicated by the item 51 , then 
closing the output files, item 52. When the loop falls through, indicating all of the plus-list items have been processed, 
the next step is calling the front end routine GEM§XX_FINI to do any front-end cleanup, indteated by item 53. Then, 
the execution is terminated, retuming control to the invoker. Item 54. 

The shell 11 calls GEM$XX_COMPILE to compile a single input stream. An input stream represents the concate- 
nation of the source files or modules 21 specified in a single "plus list" in the compiler command line, as well as any 
included files or library text. By default, compiling a single input stream produces a single object file 23, although the 
compiler does albw the front end 20 to specify multiple object files 23 during the compilation of an input stream. 

Before calling GEM$XX_COMPILE, the shell 11 creates the input stream, parses the local qualifiers, and opens 
the output files. After calling GEM$XX_COMPILE, it closes all the input and output files. 

The front end 20 (GEM$XX_COMPILE and the front-«nd routines that are called from it) reads source records 21 
from the input stream, translates them Into the intermediate representation of interface 22 (including tuples, blocks, 
etc. of the intermediate language graph, and the symbol table) and invokes the back end 1 2 to translate the intermediate 
representation into object code in the object file 23. 

An object file 23 may contain any number of object modules. Pascal creates a single object module for an entire 
input stream (a MODULE or PROGRAM). FORTRAN (in one embodiment) creates a separate object module for each 
END statement in the input stream. BLISS creates an object module for each MODULE. 

To create an object module 23. the front end 20 translates the input stream or some subsequence thereof (which 
we can call a source module 21) into its internal representaton for interface 22, which consists of a symbol table 30 
for the module and an intermediate language graph 55 for each routine. The front end 20 then calls back end routines 
to initialize the object nrxxJule 23, to allocate storage for the symtx>ls In the symbol table 30 via storage alkx:ation 28, 
to initialize that storage, to generate code for the routines via emitter 29, and to complete the object module 23. 

The compiler is organized as a cdlectkm of packages, each of which defines a collection of routines or data 
structures related to some aspect of the compilation process. Each package is identified by a two-letter code, which 
is generally an abbreviatbn of the package function. The interface to a package is defined by a specification file. If a 
package is named ZZ, then its specification file will be GEM$ZZ.SDL. 

Any symbol which is declared in a package's specification file is said to be exported from that package. In general, 
symbols exported from package ZZ have names beginning with GEM$ZZ_. The specific prefixing conventions for 
global and exported names are set forth in Table 1 . 

The shell 1 1 is a collection of routines to support common compiler activities. The shell comf)onents are interrelated, 
so that a program that uses any shell component gets the entire shell. It Is possible, however, for a program to use the 
shell 1 1 without using the back end 1 2. This can be a convenient way of writing small utility programs with production- 
quality features (input file concatenatk)n^and inclusion, command line parsing, diagnostic file generatk)n, good listing 
files, etc.) Note that the shell 11 is actually the "main program" of any program that uses it, and that the body of the 
application must be called from the shell 11 using the conventbns described bebw. To use a shell package ZZ from 
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a BLISS program, a user does a LIBRARY 'GEMSZZ*. To use the shell from other languages, a user must first translate 
the shell specification files into the implementation language. 

The shell packages are summarized in the following paragraphs; they are documented in their specification files 
(the GEM$ZZ.SDL files) in the Appendix. Most shell routine arguments (e.g.. integer, string, etc.) fall into one of the 
s categories set forth in Table 2. 

The interface from the shell 11 to the front end 20 has certain requirements. Since the shell 11 receives control 
when a compiler of Figure 1 is invoked, a front end 20 must declare entry points so the shell 11 can Invoke it and 
declare global variables to pass front end specific infomnatbn to the shell 11. The front end 20 provkies the gk>bal 
routines set forth in Table 3, in one embodiment. These routines have no parameters and retum no results. 
10 The Virtual Memory Package (GEM$VM): The virtual memory package provides a standard interface for allocating 

virtual memory. It supports the zoned memory concept of the VMS LIB$VN facility; in fact, under VMS, GEM$VM is 
an almost transparent layer over LIB$VM. However, the GEM$VM interface is guaranteed to be supported unchanged 
on any host system. 

The Locator Package (GEM$LO): A locator describes a range of source text 21 (starting and ending file. line, and 

IS column number). The text input package returns kx^ators for the source lines that it reads. Locators are also used in 
the symbol table 30 and intermediate language nodes 43 to facilitate message and debugger table generation, and 
are used for specifying where in the listing file the fisting package shouki perform actions. A locator is represented as 
a longword. The locator package maintains a locator database, and provkjes routines to create and interpret locators. 
There is also a provisbn for user-created locators, which allow a front end to create its own locators to describe program 

20 elements which come from a non-standard source (for example, BLISS macros or Ada generrc instantiation). 

The Text Input Package (GEM$T}): The text input package supports concatenated source files 21 . nested (included) 
source files 21 , and default and related files specs, while insulating the front end 20 from the I/O architecture of the 
underty ing operating system 1 3. Text of a source file 21 is read a line at a time. The text input package GEM$TI colludes 
with the kx^ator package GEM$LO to create a kxator describing each source line it reads. 

2S The Text Output Package (GEM$TX): The text output package supports output to any number of output files 44 

simultaneously- Like the text input package, it insulates its caller from the operating system 13. It will write strings 
passed by reference or descriptor It provides automatk: line wrapping and Indentatkxi, page wrapping, and callbacks 
to a user-provKled start-of-page routine. 

The Listing Package (GEM$LS): The listing package will write a standard format listing file containing a copy of 

30 the source files 21 (as read by the text input package GEM$Tl), with annotatwns provided by the front end 11 at 
locatbns specified with kx^ators. The listing file is created as a GEM$TX output file 44, which the front end 20 may 
also write to directly, using the GEM$TX output routines. 

The Intemal Representatkm 

3S 

The internal representatbn of a module 21 comprises a symbol table 30 for the module and a compact intermediate 
language graph 55 or CILG, for each routine in source nrxxJule 21 . These are both pointer-linked data structures made 
up of nodes. 

Nodes, accordffig to the frameworic of Figure 1, will be defined. Almost all data structures used in the interface 
40 between the front and back ends 20 and 12 (and most of the data structures used privately by the back end 12) are 
rKxJes. A node as the term is used herein is a self-kJentifying block of storage, generally allocated from the heap with 
GEM$VM_GET All nodes have the aggregate type GEM$NODE, with fields GEM$NOD_KIND and 
GEM$NOD_SUBKlND. Kind is a value from the enumerated type GEM$NODE_KINDS which ktentifies the general 
kind of the node. Subkind Is a value from the enumerated type GEM$NODE_SUBKINDS which identifies the particular 
^ kind of the node within the general class of nodes specified by kind. Any particular node also has an aggregate type 
determined by its kind field. For example, if kind is GEM$NODE_K_SYMBOL, then the node has type 
GEM$SYMBOL_NODE. Note that the types associated with nodes do not obey the naming conventions described 
above. The interface node types and their associated enumerated type constants are defined in the files set forth in 
Table 4. 

so The compiler framework of Figure 1 supports a simple tree-structured symbol table 30, in which symbol nodes are 
linked together in chains off of bkx;k nodes, whk:h are arranged in a tree. All symbolic information to be used by the 
compiler must be included in this symbol table 30. There are also literal nodes, representing literal values of the compiled 
program; frame nodes, representing storage areas (PSECTs and stack frames) where variables may be allocated; and 
parameter nodes, representing elements in the parameter lists of routine entry points. The symbol table stmcture and 

ss the contents of symbol table nodes are described below. 

The intemnediate language is the language used for ail intemal representatons of the source code 21 . The front 
end 20 describes the code of a routine to be compiled as a compact intermediate language graph 55, or CILG. This 
is simply a linked list of CIL tuple nodes 35 of Figure 4 (also referred to as tuple nodes, or simply as tuples), each of 
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which represents an operation and has pointers 38 to the tuple nodes representing its operands. Tuple nodes may 
also cx)ntain pointers 38 to symbol table nodes. The intenmediate language is described in more detail below. 

The front end 20 must create the internal representation 22 of the module 21 one node at a time, and link the 
nodes together into the symbol table 30 and IL data structures 55. The routines and macros of Table 5. also documented 
s In the Appendix, are used to create and manipulate the data structures of the internal representation 22. 

The back end 12 makes no assumptions about how the front end 20 represents block and symbol names. Instead, 
the front end 20 is required to provkJe a standard call-t)ack interfoce that the back end 1 2 can use to obtain these names. 

Every symbol node has aflag, GEM$SYM_HAS_NAME. and every btock node has a flag, GEM$BLK_HAS_NAME. 
When the front end 20 Initializes a symbol or block node, it must set its has name flag to indicate whether a name 
10 string is available for it. (Some symbols and blocks, such as gtobal and external symbols and top level module blocks, 
must have names.) 

There is a global variable. GEM$ST_G_GET_NAIVIE. in the ST package. Before Invoking the back end. the front 
end must set this variable to the address of a callback routine whk:h fits the descriptkxi set forth in Table 5. 

To conripile a source module using the GEM$CO_COMPILE_MODULE interface, a front end (that is. the routing 
IS QEM$XX.COMPlLE) does (in order) each of the activities described in the following paragraphs. 

1. Create the Internal Representation 

The first task of the front end 20 is to create the internal representatbn 22 of the source module. To begin with, it 
20 must call GEM$STJNIT to initialize the symbol table 30 and associated virtual memory zones. It must then read the 
source module 21 from the input stream, using the GEM$TI package; do lexical, syntactic, and semantic analysis of 
the source module 21 ; and generate the symbol table 30 and the intemiediate language graphs 55 for the nrxxiule as 
described above, using the GEM$ST and GEM$IL routines which are documented in the Appendix. 

In addftkxi, the module's source listing may be annotated with calls to the GEM$LS shell package, and error's in 
2S the module may be reported with calls to the GEM$MS package. 

If the source module 21 contains errors severe enough to prevent code generation, then the front end 20 should 
now call GEM$LS_WRITE_SOURCE to write the listing file and GEM$ST_FINI to release all the space alkx»ted for 
the internal representatbn 22. Othenwise. it must proceed with the following steps. 

30 2. Specify the Callback Routines 

Before calling the back end 12 to compile the module 21, the front end 20 must initialize the foltowing global 
variables with the addresses of routines that will be called by the back end 12. 

3S (1) GEM$ST.G_GET_NAK/iE must be initialized to the address of a routine that will yiekl the names of symbol 

and block nodes in the symbol table 30. as described above. 

(2) The GEM$SE_G global variables must be initialized to the addresses of routines that will do source-language 
defined skie effect analysis, as described below. The compiler provkies a predefined collectkxi of side effect rou- 

^ tines, suitable for use during the early devebpment of a front end 20. whfch can be selected by calling 
GEM$SE^DEFAULTJMPLEMENTATION. 

(3) GEM$ER_G_REPORT_ROUTINE contains the address of the front end's routine for reporting back end de- 
tected errors, as described betow. 

45 

3. Do the Compilatkm 

When the internal representation is complete, the front end 20 can call GEM$CO_COMPlLE_MODULE (described 
below) to translate rt into target machine object representatkjn 23. The front end should then call 
so GEM$LS_WRITE_SOURCE to list the input stream in the listing file. It may also call GEM$MU_LIST_MACHINE_CODE 
to produce an assembly code listing of the compiled module 23. 

Note that normally. GEM$LS_WRITE_SOURCE has to be called after GEM$CO_-COMPILE_MODULE so that 
the source listing 21 can be annotated with any error messages generated during back end processing. However, it is 
a good idea for the front end 20 to provide a debugging switch which will cause GEM$LS_WRITE_SOURCE to be 
ss called first This will make it possible to get a source listing even if a bug causes the compiler to abort during back end 
processing. < 
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4. Clean Up 

When compilation is complete, the front end 20 must call GEM$CO_CXDMPLETE__-MODULE to release the space 
used for back end processing, and then GEM$ST_FINI to release the space used lor the internal representation. 

The back end 12 is able to detect conditions during compilation which are likely to represent conditions in the 
source program 21 which ought to be reported to the user, such as uninitialized variables, unreachable code, or conflicts 
of static storage initlalizatfon. However, a particular front end 20 may need to customize whfch of these conditions will 
be reported, or the precise messages that will be issued. 

To allow this, the back end 1 2 reports all anomatous conditions that it detects by calling the routine whose address 
is in the gtobal variable GEM$34_G_REPOHT_ROUTINE. with the argument list described below. This routine is re- 
sponsible for actually issuing the error message. * 

There is a default error reporting routine set forth in the Appendix named GEM$ER_REPORT_ROUTIIME, whose 
address will be in GEM$ER_G_REPOFTr_ROUTINE unless the front end has stored the address of its own report 
routine there. This default routine has three uses: 

( 1 ) The default routine provides reasonable messages, so the front end developers are not obliged to provide their 
own routine unless and until they need to customize it 

(2) When the front end developers do choose to write a report routine, they can use the default routine as a model. 

(3) The front end's routine can be written as a filter, which processes (or ignores) certain errors itself, and calls the 
default routine with all others. 

INTERFACE FOR REPRESENTING EFFECTS 

As an essential step in detecting common subexpresswns (CSEs), invariant expressions, and opportunities for 
code motion, the optimizer 26 in the back end 1 2 must be able to determine when two expression tuples are guaranteed 
to compute the same value. The basic criterk>n is that an expression B computes the same value as an expressk)n A if: 

1. A and B are literal references to literals with the same value, CSE references to the same CSE. or symbol 

references to the same symbol; or 

2. 

a A. is evaluated on every control flow path from the start of the routine to B, and 

b. A and B have the same operator and data type, and 

c. the operands of B compute the same values as the corresponding operands of A (obviously a recursive 
definition), and 

d. no tuple which occurs on any path from an evaluation of A to an evaluation of Bean affect the value computed 
byB. 

The optimizer 26 of Fig. 1 can validate criteria 1 , 2a, 2b. and 2c by itself; but criterion 2d depends on the semantics 
of the language being compiled, i.e., the language of source code module 21 . But since the compiler 12 in the back 
end must be language-independent, a generic interface is provkJed to the front end 20 to convey the necessary infor- 
mation. When can the executkxi of one tuple affect the value computed by another tuple? The interface 22 must altow 
the optimizer 26 to ask this question, and the compiler front end 20 to answer it. 

The nrxxJel underlying this interface 22 is that some tuples have effects, and that other tuples have dependencies. 
A tuple has an effect if it might change the contents of one or more memory locations. A tuple has a dependency on 
a memory location if the value computed by the tuple depends on the contents of the memory location. Thus, the 
execution of one tuple can affect the value computed by another tuple if it has the effect of modifying a memory location 
which the other tuple depends on. 

Given the ramifications of address arithmetic and indirect addressing, it is impossible in general to determine the 
particular memory locatbn accessed by a tuple. Thus we must deal with heuristic approximatkxis to the sets of memory 
locations which might possibly be accessed. 

The actual interface 22 provkles two mechanisms for the front end 20 to communicate dependency Information to 
the optimizer 26. These are the straight-line dependency interface and the effects-class interface. 

In the straight-line dependency intefrface, to determine dependencies in straight-line code, the optimizer 26 will 
ask the front end 20 to (1 ) push tuples on an effects stack and pop them off again, and (2) find the top-most tuple on 
the effects stack whose execution might possibly affect the value computed by a specified tuple. 
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The straight-line mechanism is not appropriate when the optimizer 26 needs to compute what effects might occur 
as a resutt of program flow through arbitrary sets of flow paths. For this situation, the front end 20 is allowed to define 
a specified number (Initially 128) of effects classes, each representing some (possibly indetenminate) set of memory 
locations. A set of effects classes is represented by a bit vector. For example, an effects class might represent the 
5 memory location named by a particular variable, the set of all memory locations which can be modified by procedure 
calls, or the set of memory locations which can be accessed by indirect references (pointer dereferences). 

For the effects-class interface, the optimizer will ask the front end to (1 ) compute the set of effects classes containing 
memory locations which might be changed by a particular tuple, and (2) compute the set of effects classes containing 
memory locations which a particular tuple might depend on. 
10 Using this effects-class interface, the optimizer can compute, for each basic block a bit-vector (refen^ed to as the 

LDEF set) which represents the set of effects classes containing menrK>ry locations which can be modified by some 
tuple in that basic block 

The optimizer will also ask the front end to (3) compute the set of effects classes whfch might include the memory 
k)catkxi associated with a partk:uiar variable symbol. 
IS This information is used by the spirt lifetime optimization phase (see betow) to compute the lifetime of a split can- 

didate. 

The optimizer 26 uses these interfaces as follows. Remember that the reason for these interfaces is to altow the 
optimizer 26 in back end 12 to determine when "no tuple which occurs on any path from an evaluation of A to an 
evaluation of B can affect the value computed by B." If A and B occur in the same basic block, this just means "no 
20 tuple between A and B can change the value computed by B." This can be easily determined using the straight-line 
dependency interface. 

If the basic block containing A dominates the basic block containing B (i.e., every flow path from the routine entry 
node to the basic block containing B passes through the basic bk)ck containing A), then the optimizer finds the series 
of bask: blocks XI, X2. ... Xn, where XI is the bask: bk)ck containing A, Xn Is the basic block containing B. and each 
25 Xi immediately dominates X(i+1 ). Then the test has two parts: 

1 . There must be no tuple between A and the end of bask: block X1 , or between the beginning of basic block Xn 
and B, or in any of the basic blocks X2, X3,... X(n-1), whk:h can change the value computed by B. This can be 
easily detemnined using the straight-line dependency interface. 
30 2. There must be no flow path between two of the basic bkwks XI and X(i+1 ) which contains a tuple which can 

change the value computed by B. The optimizer tests this with the effects-class mechanism, by computing the 
union of the LDEF sets of all the basic blacks which occur on any flow path from Xi to X(i+1), computing the 
intersectkxi of this set with the set of effects classes containing memory locations that B might depend on, and 
testing whether this tntersectk>n is empty. 

35 

The structure of the interface wilt now be described. The interface routines are called by the back end 12. The 
front end 20 must make its implementation of the interface available before it invokes the back end 12. It does this by 
placing the addresses of its interface routine entry points in standard global variables. The optimizer 26 can then k>ad 
the routine address from the appropriate global variable when it invokes one of these routines. The interface routines 

40 are documented below with names of the form GEM„SE_-xxx. The front end must store the entry address of each 
corresponding implementatk)n routine in the global variable named GEM_SE_G_xxx. 

Tuples that have effects and dependencies are of interest, to this interface. Only a few of the IL tuples can have 
effects and dependencies. (Roughly speaking, tuples that do a store can have effects; tuples that do a fetch can have 
a dependency; tuples that do a routine call can have both.) 

4S More specifically, each tuple falls into one of the folbwing categories: 

1. The tuple does not have any effects, nor is it dependent on any effects. (Example: ADD). Tuples that fall into 
this class are NOT pushed on the effects stack. Nor are such tuples ever passed to GEM_SE_EFFECTS. 

2. The tuple may have effects, but has no dependencies. (Example: STORE). 

50 a The tuple may have dependencies, but does not cause any affects. (Example: FETCH). 

4. The tuple both may have effects (out-effects) and a separate set of dependencies (in-effects). (Example: pro- 
cedure calls) 

5. The tuple may have both effects and dependencies. The effects it depends on are identical to the effects it 
produces. (Example: PREINCR). 

55 

A partfcular tuple called the DEFINES tuple Is provided to allow a front end 20 to specify effects which are not 
associated with any tuple. One possible use of the DEFINES tuple would be to implement the BLISS CODECOMMENT 
feature, which acts as a fence across which optimizations are disallowed. The translation of CODECOMMENT would 
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be a DEFINES tuple that has all effects, and therefore Invalidate all tuples. 

Argument passing tuples (such as ARGVAL and ARGADR) have effects and dependencies. However, the effects 
and dependencies of a parameter tuple are actually considered to be associated with the routine call that the parameter 
tuple belongs to. For example, in the BLISS routine call F(X,.X+Y), the parameter X would have the effect of changing 
5 X. However, this would not invalidate a previously computed value of .X-f.Y, since the effect does not actually occur 
until F is called. 

The data structure of Figure 4 representing a tuple is accessed by both front end 20 and back end 12, and some 
fields of this structure are limited to only front end or only back end access. Every tuple 35 which can have effects or 
dependencies will contain one or nnore longword flekis 42 or 43, typically named GEM_TPL_xxx_EFFECTS or 
10 GEM_TPL_xxx_DEPENDENCIES. The field names used for particular tuples are described in the section on The 
Intermediate Language. No code in the back end will ever examine or modify these fields ~ they are reserved for use 
by the front end. They are intended as a convenient place to record informatran which can be used to simplify the 
coding of the interface routines. There is a similar longword field named GEM_SYM_EFFEGTS in each symbol node 
of symbol table 30. which is also reserved for use by the front end 20. 
IS For the straight-line dependency interface, a description of the routines will now be given. The front end provides 

an implementation of the folbwing routines: 

GEM_SE,PUSH_EFFECT(EIL„TUPLE : in GEM^TUPLE_NODE) - 
Pushes the EIL tuple whose address is in the EILJTUPLE parameter onto the effects stack. 
GEM_SE_PUSH_EFFEGT(EIL_TUPLE : in GEM_TUPLE_NODE) - Pops the topmost EIL tuple from the effects 
20 stack. This is guaranteed to be the tuple whose address is in the EIL_TUPLE parameter. Of course, this means that 
the parameter is redundant. However, it may simplify the coding of the POP procedure for a front end that doesnl use 
a single-stack implementatk)n for the effects stack (see the Implementation discussbn below). 



2s GEM^TUPLE.NODE = 

GE^L.SE_FINDJEFEECT( 

EILJTUPLE : in GEM_TUPI£.NODE, 

MIN.EXPR.COUNT : value) 

30 

Returns the most recently pushed tuple whose GEM_TPL_EXPR_COUNT field is greater than 
MIN_EXPR_COUNT (see below), and whose execution may change the results produced by EIL_TUPLE. Returns 
null (zero) if no tuple on the stack affects EIL.TUPLE. May also return the same tuple specified in the parameter. 

^ GEIMLTUPLE^NODE = 

GEM_SE_FINDJEFFECTS ( 

VAR_SYM : in GEM_SYMBOL_NODE, 

MIN_EXPR_COUNT : value) 

40 

Returns the most recently pushed tuple whose GEM_TPL_EXPR_COUNT field is greater than 
MIN_EXPR_COUNT (see betow), and whose executbn may modify the value of variable WAR^SYM. Returns null 
(zero) if no tuple on the stack affects EIL_TUPLE. May also return the same tuple specified in the parameter. 

GEM_^SE_PUSH^EFFECT and GEM_SE_POP_EFFECT will be called only with tuples which can have effects. 
^ GEM_SE_FIND_EFFEGT will be called only with tuples whbh can have dependencies. 

There is an order of invocatkxi. Every EIL tuple has a flekJ called GEM_TPL_EXPR_COUNT. This field contains 
the index of the tuple in a walk of the EILG in which basic blocks are visited in dominator tree depth-first preorder. If 
the back end 1 2 calls GEM_SE_PUSH_EFFECT with a tuple A, and subsequently calls GEM_SE_PUSH_EFFECT or 
GEM_SE„FIND_EFFECT with a tuple B. without having called GEM_SE_POP_EFFECT with tuple A in the interim. 
so then it is guaranteed that either tuple A precedes tuple B in the same basic block, or the basic bkx^k containing tuple 
A properly dominates the bask: block containing tuple B. Therefore, the EXPR_COUNT values of tuples on the effects 
stack decreases with increasing stack depth (i.e., more recently pushed tuples have higher EXPR_COUNTs than less 
recently pushed tuples). This means that the FIND_EFFECT routine can cut short its search of the effects stack as 
soon as It encounters a tuple T whose EXPR_COUNT is less than or equal to the MIN_EXPR_COUNT argument. This 
ss is because all tuples stacked deeper than T are guaranteed to have EXPR.GOUNTs that are less than 
MIN_EXPR_COUNT. ? 

The mechanism actually used for the implementation of the effects stack is entirely up to the front end 20, as is 
the rule that it uses to detennine when the execution of one tuple might affect the value computed by another tuple. A 
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naive stack implementation is certainly possible, though it would probably be inefficient. A more sophisticated imple- 
mentation might be built around a hash table, so that multiple small stacks (possibly each concerned with only one or 
a few variables) would be used instead of a single large stack. 

The effects-class interface will now be described. Recall that an effects set is a bit vector representing a set of 
s effects classes, and that an effects class represents some arbitrary set of memory locattons. Typically, an effects class 
will represent one of the following: 

1. A single named variable. For effective optimizatkm, each simple (i,e., non-aggregate) kx:al variable which is 
used frequently in a routine shoutel have an effects class dedicated to it. 
10 2. A set of named variables with some common property; for example, in FORTFWN, all the variables In a particular 
named common block 

3. A set of memory locations which may not be detemnined until runtime, but which have some common property; 
for example, all the memory locations whk:h are visible outside this routine (and which might therefore be OKxiified 
by a routine call); or, in Pascal, all the memory k>cations which will be dynamk:ally allocated with NEW calls and 
IS whk:h have a particular type. 

The literal GEM_SE_K_MAX_EFFECTS is exported by the GEM^SE package. It is the maximum number of distinct 
effects classes that the front end 20 may define. It will be 128 in the Initial implementatton. The 
GEM_SE_EFFEGTS_SET type is exported by the GEM_SE package. It is a macro which expands to BITVECTOR 
20 [GEM„SE_K_MAX_EFFECTS]. Thus, given the declaratkxi X: GEM_SE_EFFECTS.SET the following constructs 
are all natural (where 0SN5GEM_SE_K_MAX_EFFECTS - 1): 

X[N] = true; ! Add efiFccts class N to set X. 

X[N] = false; ! Remove effects class N from set X. 

if .X[N] then ... ! If effects class N is in set X ... 

The interface routines for the effects-class interface will now be descn'bed. The front end 20 must provide an 
30 implementation of the following routines: 

GEM_SEJBFFECTS( 

EE-^TUPLE : in GEM_TUPLE_NODE, 

35 EFFECTS.BV : inout GEM.SE_EFFECTS.SET) 

The union of the effects of tuple EIL_TUPLE and EFFECTS_BV is written into EFFECTS^BV 

40 GEM_SE_DEPENDENCIE5( 

EILj rUPLE : in ,GEM_TUPLE_NODE. 

EFFECTS.BV : inout GEM_SE_EFFECTS.SET) 

^ Writes the set of effects classes that EIL_TUPLE depends on into EFFECTS_BV- 

GEM_SE_VARIABLE_DEPENDENCIES( 

SYMB OL : in GEM_SYMBOL_NODE. 

so EFEECTS_BV : out GEM_SE_EFFECTS_SET) 

Writes into EFFECTS^BV the set of effects classes that might include the memory associated with variable SYMBOL 
GEM_SE_EFFECTS will be called only with tuples whfch can have effects. GEM_SE_DEPENDENCIES will be 
^ called only with tuples which can have dependencies. 

The compiler may provide implemerttations for the interface routines mentbned above, but these routines are not 
intended for use in a production compiler. They are inefficient, and their rules for when one tuple invalidates another 
probably will not coincide exactly with the semantics of any partk:ular language. However, they allow useful default 
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<^timi2aticx>s to occur while other components of a front end 20 being implemented. 

The EFFECTS field of each symbol node is treated as an effects class number, between 32 and 
GEM_SE_K_M AX_EFFECTS. When the address expression of a fetch or store tuple has a base symbol, the EFFECTS 
field of the symbol is checked. If it zero, then it is set to a new value between 32 and GEM_SE_K_MAX_EFFECTS. 
s For computing effects sets, using the effects class implementatbn as described above, the front end must call 

GEM_SEJNIT_EFFECTS_CLASSES before invoking the GEMJL_BUILD phase. 

This implementation provides information about effects by defining a simple nxxJel for effects: 

1 . No variables are overlakJ: 

10 2. Data access operatkxis not in canonical form (as defined in CT.006) have (for stores) or depend on (for fetches) 
effect 0. 

3. Calls have effects 32 through GEM_SE_K_MAX„EFFECTS. ARGADR parameters are treated as if the call 
writes into their address operands. 

« Effects classes 0 and 32 through GEM_SE_K_MAX_EFFECTS are reserved. Effect 0 represents references to 

memory such that the variables referenced cant be identified (pointer dereferences, parameters, etc.) 

When a variable is first referenced using a data access operator in canonical form it Is assigned an effects class 
number n in the range 32 to GEM_SE_K„MAX_EFFECTS. The number is recorded in the EFFECTS field of the symbol 
node. The reference and all subsequent references to that variable will have effect or dependency n. 

^ The implementatkxi includes some hooks for experimentatk)n, testing, etc: 

1 . Tuples that may have effects or dependencies have one or more "effects fields" (EFFECTS, DEPENDENCIES, 
EFFECTS_2. etc.) resented to the front end to record the effects and dependencies of the tuple. The compiler- 
supplied effects class callbacks interprets an effects field as a bitvector of length 32 representing the first word of 

2S a GEM_SE_EFFECTS_SET. That is, if bit n of the fiekJ is true, the routines add effects class n to the computed 
effects of the tuple. 

2. The front end can choose the effects class for a variable by wrifing the effects class number between 1 and 
GEM_SE_K_MAX_EFFECTS into the effects fieW of the variable's symbol node. The effects class routines do not 
assign an effects class if the EFFECTS field is not zero. 

30 a Effects classes 1 through 32 are resented for use by the front end. It may assign any interpretation to those 
effects classes. 

To use the straight-line dependency implementatbn discussed above, the front end must call 
GEM_SEJNIT_EFFECTS_STACK before invoking the GEM_DF^DATAFLOW phase. This implementation uses the 
3S infomnatlon provided by the GEM_SE_EFFECTS and GEM_SE_DEPENDENCIES callbacks to determine invalida- 
tk)ns. That is. GEM_SE_FIN-D_EFFECT(X) returns the most recently pushed tuple Y such that the intersectbn of 
GEM_SE_EFFECTS(Y) and GEM_SE_DEPENDENCIES(X) is non-null. 
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tWPUCTlOW VARIABLES 

According to one feature of the inventbn, an Improved method of treating induction variables in a compiler Is 
provided. First, the definition and detection of induction variables and inductive expressions will be discussed. 
An integer variable V is said to be an induction variable of kxtp L if each store to V that occurs in L: 

^ 1 . increments (or decrements) V by the same amount each time it is executed. 

2. is executed at most once in every "complete trip" through the loop. A trip is "complete' if it fiows back to the loop 
top. 



For example, the following code illustrates an induction variable V, 



50 



ss 
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Label L V = 1 

IF V> 10 

GOTO LABEL M 

ELSE 

PRINT X 
V = V + 1 
END IF 

In the compile function, in addition to finding induction variables, we are also Interested In inductive expressions. 
Inductive ocpressions are expressions that can computed as linear functions of induction varices. 
Consider the follo««ing program: 

DO I = 1, 100 
X = I*8 
T = I-4 
Am = T * 4 
END DO 

The expressions "I * 8," "I - 4," T" and T * 4" are all biductive expressions In that they can be recomputed as linear 
functions of I. 

As a brief illustration of some of the optimizations based on induction variables, consider the following program 
example: 



1 = 1; 

L: X = X + (4 * I) 
1 = 1+1 

if I<= 100 GOTO L 

This is a straightfonMard DO loop, I being the loop control variable. Notice that the inductive expression I * 4 increases 
by 4 on each trip through the loop. By introducing a new variable, 12, we can replace the multiplication with an addition, 
which is a less expensive operation. This is optimization known as strength reduction, used in optimizing compilers for 
a longtime: 

1 = li 

12 = 4; 
L: X = X + 12 
1 = 1+1 
12 = 12 + 4 



if I <= 100 GOTO L 

Note that we now have two variables (I and 1 2) where we used to have one. Wb can eliminate the original loop control 
variable completely by recasting the uses of I to be in terms of 12: 
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12=4; 
L: X = X + n 
^ 12 = 12 + 4 

if 12 <= 400 GOTO L 

This optimization is known as induction variable elimination. 

These optimizations (strength reduction and induction variable elimination) operate directly on induction variables. 
10 In addition to these optimizations, induction variable detection provides information to other optimizations such as auto- 
inc/dec, vectorization, loop unrolling, etc. 

In the model used in the compiler of Fig. 1, induction variables may be incremented more than once during the 
loop. Furthemnore, the number of changes can even differ with each Iteration. In fact, the number of changes can be 
zero for a particular iteration. The loop invariant increment value may differ between individual stores, but each indi- 
es vidual store must Increment the variable by the same amount whenever it is executed. 

There are several different categories of inductive variables, with different properties, including basic induction 
variables, inductive expressions, and pseudo induction variables. 

Basic induction variables are the simplest fomn of induction variable. They have known properties that apply 
throughout the loop. All other induction variables and expressions are always built up as linear functions of a basic 
20 inductkxi variables. Basic Inductbn variables are generally modified in the form I = I + q or I = I - q where "q" is loop 
invariant. More generally, however, the requirement is that the assignment be of the fonn I = f(l) where f(l) is a linear 
function of I with a coefficient of 1 . 

tn the algorithms given in the Appendix, the bask: induction variables of a particular kx>p are represented by a set 
in the kx)p top. In addition tothis set, we also maintain the set of basic inductkx) variables in the loop that have conditional 
2S stores that may not be executed on every trip through the kxp. This inhibits vectorization and can make strength 
reductkMi more "desirable." 

An inductive expression is either a reference to an induction variable or a linear function of another inductive 
expressbn. Inductive expressions must be in one of the following fonns: 



-f(l) 




<(i)+g(i) 


t(i)-g(i) 


f(l) + E 


E+f(l) 


f(l) - E 


E-f(l) 


f(l)*E 


EM(I) 



where f(l) and g(l) are inductive expresstons derived from basic induction variable I with respect to loop L and E is 
invariant in loop L If there are no stores to I between f(l) and the arithmetic operator of which It is an operand, then 
the arithmetk: operator is an inductive expression derived from basic induction variable I with respect to loop L. 

40 The Other category is pseudo induction variables. Under certain conditions, a variable may behave like an induction 

variable on all but the first trip through the kx)p. These can be turned Into Inductkm variables (and thus vectorized) by 
peeling the first iteration off the kx)p. Such variables are refen-ed to as "pseudo Inductkxi variables.' This occurs when 
a fetch within the loop is reached only by two stores, one within the loop that defines a derived inductton variable, and 
another store whose value flows in through the loop top. Additkxially. it must be guaranteed that all stores within the 

4S loop are executed once per trip. 
For example: 

D = 50 

so DO I = 1, n 

AP] = D + ... 
D = I + 4 

ss On the first trip through the kx)p, D has the value 50 at the assignment to I. On subsequent trips, D has the value 5,6,7, 
etc. By unrolling the loop once, the subsequent trips can be vectorized. Note that the algorithms given herein do not 
find inductbn variables that are pseudo inductk>n variables. 

In order to Identify a basic inductkxi variable the compiler must be able to recognize all stores to it. The absence 
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of the "has aliased stores" attribute guarantees this and thus we only recognize basic induction variables that do not 
have "has aliased stores." 

Detection of basic induction variables requires the use of "sets" of potential induction variables. Doing this dynam- 
ically for each loop is an expensive and complicated operation. Instead, we will use the side effect sets used to construct 
s IDEF sets. 

A variable "X" is said to be "in" IDEF set S if the all the effects that fetch's of X depend on are in S. That is, X is in 
IDEF set S only if GET.VARIABLE_DEPENDENCIES(x) is a subset of S. 
Note that the presence of X In a basic induction set implies only that: 

10 (a) X Is a t)aslc induction varable or 

(b) X is loop invariant and shares IDEF bits with at least one variable that is a basic induction variable. 

The algorithm descriptions in the Appendix take the following liberties (perhaps more) in the interest of keeping 
the algorithm descriptbn simple: (1 ) The collectton of the constant parts of the Innear functkwi cannot cause an overflow. 
IS (2) All stores completely redefine the variable. 

The algorithm as set forth in the Appendix starts out by assuming that all variables modified in the loop are basic 
inductk)n variables. Each loop top has a basic inductbn variable set. As we find stores that dont satisfy the requirements 
for basic inductkxi variables, we eliminate variables from the basic IV set of the bop top. 

Since inductive expressions and derived induction variables are always f unctbns of basic I Vs, we might say that 
20 fetches of basic I Vs are the atomic fonms of inductive expressbns. That is, for an expression to have the inductive 
property it either has inductive operands, or it is a fetch of a basic inductkxi variable. 

Using the rules given earlier, we build up Inductive expressbns from simpler inductive expressbns based on as- 
sumptions about bask: I Vs. The basic IV of an inductive expression is always retained with the expression. Thus, after 
the algorithm has run, we can tell whether the expressbn is truly inductive by checking to see that the basic IV from 
25 which it is derived is still in the basic IV set of the loop. 

The FINDJ V algorithm given in the Appendix will become part of the DATAFLOW phase whrch does a depth first 
dominator tree walk. 

The following is a summary oven^iew of the tuple processing that is done: select TUPLE[OPCODE} 
[FETCH] 

30 If base symbol is still a basis IV candidate 

then 

mark this tuple as being inductive. 

[STORE] 

Let V be the base symbol of the store. 
35 If the value being stored is not inductive or_else 

the basic IV of the inductive value being stored is not V or_else 
the coefficient of the stored value is not 1 
remove V from the basic IV set of the bop top 

then 

^ remove V from the basic IV set of the bop top 

then 

mark the store as being Inductive 
[ADD. SUB, MUL, etc.] 

If one operand is inductive and other operand is kx>p invariant 
45 then 

mark this tuple as being inductive 
The fields added to the tuple data stmcture, and fields added to the flow nodes, to accommodate induction variable 
detection, are set forth in Table 6a. 

AUTOMATIC CREATION OF KFOLD ROUTINE 

As previously discussed, the programming language compiler of Fig. 1 translates programs written in a source 
language into the machine language of a target machine 25. The compiler includes a front end 20, which incorporates 
knowledge of the source language in module 21 being compiled, and a back end 12, whk^h incorporates knowledge 
^ of the machine language of the target machine 25. The front end translates programs from the source language into 
the intermediate language of the ILG 55. and the back end translates programs from the intermediate language into 
the target machine language. 

The intermediate language generally specifies a collection of operators (for example, add, shift, compare, fetch, 
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store, or tangent), a collection of data types (for example, "signed 32-bit integer," "IEEE S-format floating point," or 
'character string"), and a representation for values of those data types. 

One of the optimizations included in the optimizer 26 is a constant expression evaluation routine. An example of 
a source code listing that may be related to a constant expression is shown in Fig. 6. where A and B are found to be 

s constants, so A+B is a constant, then I and J are both equal to the same constant. The compiler can do the calculation 
(A + B), and save the fetch of A and B separately at run time, as well as saving the ADD operation. The l=A+B and 
J=:A+B expressions of the code of Figure 6 are thus both represented as merely STORE #9,1 or STORE #9,J. This is 
known as "constant folding" because the constants are detected, calculated at compile time, and folded' into the object 
code image. The mechanism for doing this is part of the optimizer 26. referred to as a Kfold routine. 

10 The compiler of Figure 1 incorporates a Kfold routine for evaluating expressions of the intermediate language to 

find these constant expressions. In general, given an operator of the intermediate language and the values of its op- 
erands, this routine will yield the same value which is computed by that operator when applied to those values. Such 
a constant expression evaluation routine has many applications in a compiler. For example, 

IS (a) The execution speed of the machine code which is generated for a program may be improved if some expres- 

sions of the program can be evaluated by the compiler itself rather than when the program is executed. 

(b) Some source languages may albw the use of expressions with constant operands to represent constant values. 
Compilation of a program in such a language requires the evaluation of such expressions by the compiler. 

(c) If the repertoire of operations provided In the intenmedlate language is richer than the set of operations provided 
20 by the programming language or environment in which the compiler is implemented, the most convenient way to 

perform some computation in the compiler may be to represent it as an expression in the intenmedlate language 
and submit tt to the constant expression evaluation routine. 

The implementation of a constant expression evaluation routine may be a matter of considerable difficulty. The IL 

25 may have dozens of operations (e.g., ADD, SUBT, COSINE, etc.), and when distinct data types are considered (e.g., 
INT32, NINT64, FLOATA, etc.), an intenmedlate language nnay have hundreds or thousands of distinct operators. The 
evaluator must be able to apply each of the operations to each of the data types correctly, lest the compiler fail to 
perform its function fully or correctly. Particularly when floating-point types are involved, it is likely that not all of the 
operatk>ns whk:h can be represented in the intenmedlate language will be directly available in the programming Ian- 

30 guage in which the compiler is implennented. Consequently, a constant expressbn evaluation routine is liable to be 
extrenrtely long, containing hundreds of distinct cases, and be highly error-prone. 

According to an important feature of one embodiment of the invention, the cmcial point is that the one language 
in which the precise meaning of an operator of the intenmedlate language can always be specified both tersely and 
precisely is the intemnediate language itself. That is. the compiler back end itself must be capable of generating code 

3S which correctly Implements any operator of the intermediate language. Another way to say this is that compiler back 
end already embodies the knowledge of the sequences of machine code instructions necessary to realize the effect 
of each intermediate language operator, and it would be redundant to have to encode this sanne knowledge again in 
a different fonm in the constant expression evaluation routine. 

Based upon this concept, according to the Invention, the nnechanlcal generation of a constant expression evaluatbn 

40 routine becomes straightforward: The first step is to create a new compiler of Fig. 1 , whteh uses the same back end 
12 as the regular compiler, but replaces its front end 20 with the special front end described below. (Equivalently. 
provide a special mode for the compiler in which it operates as described below.) 

Second, the special front end 20 or special mode of operation does not read and translate a source program 21 . 
Instead, it generates the intenmedlate language for the constant expressk)n evaluatkxi routine, as folbws: 

45 

(a) The routine does a conditional branch to select a case based on the intermediate language operator specified 
in the argument list. 

(b) Each case contains the code for a single operator. It fetches the operand values from the routine's argument 
list, applies the operator to them, and returns the result. 

50 (c) Since the routine Is being generated directly In the Intermediate language, the code for each case simply consists 
of intenmedlate language operators to fetch the operands from the argument list, then the intermediate language 
operator for the particular case, and then the Intermediate language operators to return the result. 

Third, when this intermediate language graph is submitted to the compiler's back end, it will generate machine 
5S code for the constant expression evaluatbn routine. 

In the special front end just described, the front end can contain a list of all the operators for which cases must be 
generated, and can mechanically generate the intemnediate language for each case. 

However, the process can be further simplified if, as may often occur, the compileback end already contains a 
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table of operator information. (For example, such a table may be used to check the correctness of the intermediate 
language graph generated by the front end.) It Is then possible for the special front end to use this table, already 
provided by the back end, to determine which cases to be generated. 

s TYPE DEnMITlOW 

The compiler of Fig. 1 uses a type definition module referred to as the GEM_TD module. GEM_TD provides the 
mechanisms used by a front end 20 and back end 12 in constmcting program type informatkm to be incorporated in 
an object module for use by a linker or debugger. It is intended that this type specification service will allow a front end 
10 20 to describe program symbols and their associated type information to the object module builder 29 in a manner 
independent of target object file requirements. This type specification sen^ice acts as a procedural "grammar of types' 
so that the compiler way associate abstract type specfficattons and program symbols. The type specificatton interfaces 
are defined bebw. and a number of examples of the use of the GEM_TD sen^ices are referenced. 

The creatbn of type infonmatran takes place in the context of symbol table 30 creation and albws a front end 20 
IS to specify an abstract representation of program type infonmatbn. The object nrxxJule bulkier 29 will later use this 
information in constructing Debug symbol table information. 

The GEM_TD nnodule provides sen/ice routines that albws a front end 20 to describe basic types and derived 
types. These routines typteally constmct internal data structures describing the specified type infomiatlon. A new com- 
piler node type, GEM_TDI, will be defined to manage this type informatbn. The definition of the type node data structure 
20 is private to the compiler 1 2 and may not be altered or examined by the front end 20. When defining a type, the front 
end 20 is returned a "handle" to the type node by the GEM_TD routine defining the type. The handle allows a front 
end to associate a type with a program symbol but prohibits It from altering or examining the fields of the data structure. 

Type nodes will be created and managed by scope, that is. when transmitting type infonmatbn, a front end 20 will 
specify the block node that a type is to be declared within, and the shell will be responsible for the management of the 
2S type nodes within that scope. The shell will manage type nodes in a list rooted in the block node in which the type is 
defined. The block node data stmcture will be expanded to define the fiebs TYPE_LIST_HEAD and TYPE_LIST_TAIL 

A front end 20 may choose to make on-the-fly calls to the type specificatbn service routines or may choose to 
make a pass over the entire symbol table to generate the type information. 

After defining a type the front end must associate this type information with the symbols of that type. Symbol nodes 
30 will have a new field DST_TYPEJNFO used to associate a symbol with its type. A symbol's DST_TYPEJNFO field 
will contain the address of the type node handle returned by a GEM_TD service. A symbol node with a 
DST_TYPEJNFO value of null will have the target specified behavior for symbols not having type information. 

Referring to Figure 7, the data fiekte and relationships are illustrated for the functbn: 

35 

int toy_procl) 
{ 

float b,c; 

40 ' 

} 

A block node 60 for toy-proc contains fields 61 and 62 (decl list pointers) pointing to the entries 63, 64 and 65 in 
the symbol table 30. Also, it contains fields 66 and 67 functioning as type list pointers, pointing to the entries 68 and 
69 in the type list for int and float. The entries 63, 64 and 65 also have pointers 70, 71 and 72 pointing to the entries 
68 and 69, for int and float, as the case may be. 

The GEM_TD type specificatbn service consists of routines to allow a front end 20 to define standard and derived 
types and to associate those types with program symbols. The compiler back end 12 will use the resulting type defi- 
nitions and their symbol node associations to generate target specified Debug Symbol tables. Note that boolean is not 
so considered a basb type. Compilers for languages such as Pascal shouki define boolean as an enumeration containing 
the elements tore and false. 

ACTION LANGUAGE FOR MULTIPASS CODE GENERATOR 

ss A method for doing code generation in the back end 12 by code generator 29 using code templates will now be 

described. The selection and applicatbn of code templates occurs at four different times during the compilation process. 

1 . The PATSELECT phase does a pattem match in the CONTEXT p ass to select the best code templates. (During 
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this pattern match the UCOMP and DELAY optimization tasks are done in parallel as part of the pattern matching 
process.) 

2, The TNASSIGN and TNLIFE tasks of the CONTEXT pass use context aclkwis of the selected templates to 
analyze the evaluation order to expressbns and to alkx:at6 TNs with lifetimes nonlocal to the code templates. 
s 3. The TNBI ND pass uses the binding actfons of the selected templates to altocate TNs with lifetimes local to the 

code templates. 

4. Finally, the CX)DE pass uses code generatkyi actions of the selected templates to guide the generation of object 
code. 

10 A template Is used at different times during a compilatkxi. It consists of three major components: 

1 . 1 LG Pattem - which guides the template selectton process that matches templates to applicable ILG structures. 

2. Undetayed Actions ~ whk:h determine the processing of matched ILG structures during the CONTEXT, TNBIND 
and CODE passes. The undelayed actkxis are performed when the template is first processed in each pass. As 

^5 a result, the template actions for each ILG node are processed three different times - once for each pass. Some 

of the actons will have meaning for only one pass and will be ignored in the other passes. Other acttons will have 
meanings in more than one pass but the required processing will be different in each pass. 

3. Delayed Actbns « whfch also detemnine the processing of matched ILG structures during the CONTEXT 
TNBIND and CODE passes. The delayed actions are performed each pass when the result computed by the 

20 template is first processed as the leaf of another template. Delayed actions are useful on target machines like a 

VAX that have address modes. Simple register machines like a RISC would probably not make heavy use of 
delayed actions. 

An ILG pattem of a code generation template consists of four pieces of infonnation: 

2S 

1. A result value mode (see the examples given in the Appendix) which encodes the representation of a value 
computed by the template's generated code. 

2. A pattem tree which describes the arrangement of ILG nodes that can be coded by this template. The interk>r 
nodes of the pattem tree are IL operators; the leaves of the pattem tree are either value mode sets or IL operators 

^0 with no operands. 

3. A sequence of Boolean tests. Alt of these must evaluate to true in order for the pattern to be applicable. 

4. An integer that represents the "cost" of the code generated by this template. 

The pattem matches or PATSELECT phase matches an ILG subtree with the pattem of a template. If more than 
3S one template pattem can be applied at an ILG node then the pattem matcher delays choosing between the alternative 
templates until it knows whk:h one leads to the k>west estimated code cost. 

There are three different actkxi interpreters - the CONTEXT interpreter, the TNBIND interpreter and the CODE 
interpreter. The actk)ns of each template are performed in three different passes of the compiler by the appropriate 
interpreter. Although the kJentical template Is used in all three passes, the semantics of the actions are phase dependent 
40 so that different things are done each pass. Many actkxis have meanings in only one of the three passes and they do 
nothing in the other two passes. OTher actbns have meaningsj'n more than one pass but the semantics of an actkxi 
in one pass are often very different from the semantics of the same action in a different pass. However, having only 
one action sequence in a template makes it very easy to understand and to maintain the dependencies between the 
various passes. 

^ The action sequence for each template consists of two parts - the undelayed actions and the delayed actions. 
When a pattem of selected ILG nodes is first processed the undelayed actksns are interpreted. When the ILG pattem 
is later used as the leaf of another ILG pattem then the delayed actbns are Interpreted. 

At the start of interpreting the undelayed actions a table of operand variables is created. An operand variable can 
contain a temporary name (TN). a literal or a target specific address mode. 
so Temporary names are each partitioned into one of three classes: (1 ) permanent TNs, (2) delayed TNs and (3) local 
TNs. The class of a TN is determined by its lifetime and usage. 

Each TN must have an allocatbn lifetime. The alkjcation lifetime is begun by the appropriate template actbn and 
extends along all flow paths leading to the last use of the TN. The TNs in the permanent class can have a lifetime that 
ends some arbitrarily large anriount of code into the future after creation of the TN. The life of a delayed class TN must 
ss begin n a delayed action of a template and terminate shortly afterwards when the TN is used as a leaf. The life of a 
kx^al TN never extends beyond the Interpretatton of a single pattem. 

The class of a TN determines how it is processed. Permanent class TNs are created once in the CONTEXT pass 
and the same TN data structure is kept through all three passes and is used to store the complicated lifetime description 
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of the TN. Delayed class and local class TNs have lifetimes of very restricted duration so they do not need a permanent 
data structure to track this infomiation. As a result, the TN data structure for delayed class and local class TNs are 
built each pass when interpreting the actions and deleted immediately after their last use in each pass. Interpreting 
the same action sequence in each pass guarantees identical TN data stmctures are built in each pass for TNs of these 
classes. 

There will be a large list of different template actions. Some of the actions will be target machine dependent. The 
Appendix contains a list of proposed or example template actions, so that a user can by these code template examples 
determine for a particular embodiment what will be needed. 

THE INTERMEDIATE LANGUAGE REPRESENTATION 

The internal representation used in the compiler framework 10 of Fig. 1 comprises the symbol table 30 and Inter- 
mediate language graph 55. which are the data structures created by the front end 20 to represent the structure, data, 
and code of a source module 21 . The following describes the nodes which are the primitive components of these data 
structures, including a specification of the symbol table 30 and intermediate language used in the IL graph 55. In a 
compiler as described with reference to Fig. 1 . the front end 20 generates a symbol table 30 to describe the blocks, 
routines, variables, literal values, etc. of a program contained in source module 21, and one or more intermediate 
language graphs 55, to describe the executable code. The folkiwing describes these internal data structures. 

The design of the compiler of Fig. 1 in general, and of the intermediate language and symbol table in particular, 
is intended to address a variety of architectures ranging from "Complex Instruction Set Computers" (CISC) such as 
VAX to "Reduced Instruction Set Computers" (RISC) such as PRISM. MIPS (a 32-bit RISC machine), or an advanced 
64-bit RISC architecture. This design does assume that the architecture of target machine 25 has certain bask; features. 
First byte organization and addressability are assumed and Twos-complement binary arithmetic, with "Little-endian" 
bit ordering. "Reasonable" address representation is also assumed. i.e., that an address fits in a register. 

In general, the front end 20 can be oblivious to the details of the target architecture 25 when creating the interme- 
diate representation of a program. Most constructs of the intermediate representation have a well-defined pruning 
whfch is indeperKlent of the target architecture 25. There are some issues that must be resolved in implementing the 
front end 20, however. First, not all data types will be available on all architectures, as explained below. Second, 
arithmetic overfk>w behavtor and the representatk)n of "small integer" arithmetic may vary on different architectures, 
again, as discussed below. Third, the behaviors of some operators (such as the arithmetic shift operators) are defined 
only for subranges of the operand values for whk:h the underlying machine instructions are defined on particular ar- 
chitectures. For operand values outside this specified range, such operators may be well behaved for any particular 
machine, but may have different behaviors on different machines. Lastly, calling conventions will be different on different 
target systems 25, requiring the front end 20 to generate different intermediate representations for the same source 
language constructs in some cases. 

The phrase "Intermediate Language" refers to an abstract language for specifying executable code. An "Interme- 
diate Language Graph" (ILG) 55 is a partcular program expressed in this language. 

The intemiediate language in graph 55 is really a language of data structures In merTK)ry; with pointers providing 
the syntactk: structure. However, there is also an approximate textual representatkxi for ILGs, used for IL dumps written 
by the compiler as a debugging aid. 

The primitive concept of the I L is the tuple as described above with reference to Figure 4 ~ an ILG 55 is made up 
of tuples 35 representing the operations to be executed. These tuples 35 are tied together by pointers (e.g., operand 
pointers 38) which represent various relations. The most important relations are the operator-operand relation (a pointer 
38 from an operator to each of its operands) and the linear ordering on all the tuples in each basic block of the ILG, 
which provkJes a nominal executkm order. This linear order is represented by the tuple number 40 within a bkx:k. and 
by the pointers linking all the bk)cks of a routine or module. 

The computatbn defined by an ILG 55 is as folk>ws: 

(1 ) Start at the BEGIN tuple of the ILG. 

(2) Evaluate each tuple in linear order: fetch the saved results of its operands, compute and save its result, and 
perform any secondary action that may be defined for it. (There are exceptions to this simple evaluation rule for 
■flow boolean" and "conditional selection" operators.) 

(3) After evaluating a branch tuple, continue evaluation at the label tuple selected by that branch tuple. 

It shoukJ be understood that these rules define the "meaning" of an IL graph 55. The code generator 29 is allowed 
to rearrange the actkxis indicated by the ILG, so long as it presences their dependencies, as specified by the following 
rules: 
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(1) If the ILG 55 contains an expression, and a statement whose execution might affect the value computed by 
evaluating the expression, then the generated code for the expression and the generated code for the statement 
must be executed in the same order that the statement and the expression occurred in the ILG. 

(2) If the ILG 55 contains two statements whose execution might affect the value computed by evaluating some 
common expression, then the generated code for the two statements must be executed in the same order that the 
statements occurred in the ILG. 

The question of when the execution of a statement might affect the value computed by the evaluation of an ex- 
pression is resolved by reference to the side effects mechanism described below. 

The ILG 55 constructed by the front end 20 is not the same as the ILG processed by the back end 12. The front 
end 20 generates a Compact IL Graph, while the back end 12 processes an Expanded IL Graph. When the back end 
12 generates code for a routine, the first thing it does is to expand that routine's CILG into an EILG. The differences 
between the two forms are several. First, the CIL provides "shorthand" tuples, which are expanded into sequences of 
tower-level tuples in the EIL. Second, the nodes which represent EIL tuples have many more fields than the nodes 
which represent CIL tuples. The additional fietels contain informatton whbh is used by the back end 12, but which can 
be computed by the I L expander (or by other back end phases) from the fields In the CIL nodes. Third, there are different 
structural restrictions on the CILG and the EILG. This descriptkxi is directed to the compact IL, although this information 
generally pertains to both the CIL and the EIL. 

The structure of a symbol table 30 represents the structure of the module 21 being compiled. At the heart of the 
table 30 Is a tree of block nodes representing the blocks, routines, and lexrcal scopes of the module 21 ; the tree structure 
represents their nesting relationship. Associated with each block node is a list of the symbol nodes which are declared 
in that block. Associated with each routine block is an ILG 55 representing the code for that routine. A synnbot node 
represents a symbolic entity in the nrxxiuie, such as a variable, label, or entry point. Constant values in the module 21 
being compiled are represented by literal nodes. Literal nodes may be ref en^ed both from the symbol table 30 and from 
ILGs 55. The term llterar table is also used to refer to the collection of all literal nodes that have been created in a 
compllatbn. Frame nodes represent areas of storage in whk:h code and data can be alkxated. Generally, these are 
either the stack frames of routines or PSECTs. Parameter nodes are used to build parameter lists, which are associated 
with entry point symbols. Each parameter node relates a parameter symbol in a routine with a location in the argument 
list of an entry point. 

Data Types: 

The intermediate representation used in graph 55 describes a program for an abstract machine 25, which has only 
a small set of types, the data types which are described In the following list. These data types are distinct from the data 
types of the source language of module 21, which are relevant only to the front end 20. it is the responsibility of the 
front end 20 to determine, for each target machine 25. the data types to be used to represent each source language 
datatype. 

Data Types 
Null 

Representational 
Scalar 
Address 
Signed integer 
Unsigned Integer 
Floating Point 
Complex 
Boolean 

The null data type is a special data type, which is the type of tuples that do not compute a value. A representational 
data type is a type whose values have a specific representation in the target machine architecture. The representational 
data types are divided into scalar data types and aggregate data types. A scalar data type is one whose values can 
be represented in a small fixed number of memory kx:ations or registers. The scalar data types are subdivkled into 
the address data type and the arithmetb data types. Note that the arithmetic types may be used to represent any other 
kind of data than can fit in the appropriate number of bits. In particular, source language character and togbal data 
types must be represented with integer data types. There is a single address data type, ADDR. A value of type ADDR 
is represented as a binary integer with 32 or 64 bits. 
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Ttiere are signed integer data types INT8, INT16, INT32, and INT64, where a value of type INT^*^ is represented 
as a signed binary integer with bits, and is therefore In the range - (2^'^) ... (2x-i-l). The type INT8 nnay also be 
referred to as IBYTE. The type INT16 may also be referred to as IWORD. The type INT32 may also be referred fo as 
ILONG. The type INT64 may also be referred to as IQUAD. The integer type with the same number of bits as an address 
s may also be referred to as lADDR. The largest signed Integer type supported for the target architecture (INT32 or 
INT64) may also be referred to as IMAX. Any binary scaling (as in PU\) must be provided by the front end - there are 
no IL provisions for a scaled binary data type. 

There are unsigned integer data types UINT8, UINT16. UINT32. and UINT64, where a value of type UINP^-^ is 
represented as a signed binary integer with bits, and is therefore in the range 0 . (2'^ - 1 ). The type UINTB may also 
10 be referred to as UBYTE or as CHARS. The type UINT16 may also be referred to as UWORD or as CHAR16. The type 
UINT32 may also be referred to as ULONG . The type UINT64 may also be referred to as UQUAD. The unsigned integer 
type with the same number of bits as an address may also be referred to as UADDR. The largest unsigned integer 
type supported for the target architecture (UINT32 or UiNT64) may also be referred to as UMAX. 

The floating point data types are the VAX floating point types, REALF, REALD, REALG, and REALM, and the IEEE 
IS floating point types. REALS, REALT, REALQ, and REALE. Not all of these will necessarily be supported on any par- 
ticular target architecture. 

The complex data types are CMPLXF, CMPLXD, CIViPLXG, GMPLXS. and CMPLXT A complex value is repre- 
sented as a pair of values of the corresponding real type, which represent the real and imaginary parts of the complex 
value. Only complex types which con^espond to supported floating point types will be supported on a particular target 
20 architecture. 

A value of an aggregate data type consists of a sequence of contiguous elements. An aggregate value is charac- 
terized by its body, the actual sequence of elements, and length, the number of elements in the sequence. The aggre- 
gate types are: 

2S (a) Character strings, type STR8, which have elements of type CHAR8. 

(b) Extended character strings, type STR16, which have elements of type CHAR 16. 

(c) Bit strings, type BITS, whose elements are single bits, packed as tightly as possible. 

(d) PL/1 and COBOL decimal strings, type DECIMAL, whose elements are decimal digits (represented as four-bit 
BCD digits, packed two per byte, with a leading sign digit). (The DECIMAL value is characterized by Its preciskxi, 

30 the number of digits it contains (not counting the leading sign digit), and Its scale, the number of those digits which 
are regarded as coming after the decimal point. 

The elements of an aggregate value are numbered starting at zero. (Note that this will require many front ends to 

subtract one when translating a source program string index to an IL string Index.) 
95 There is no limit on the number of elements which may be processed in a string operatk)n. A flag might be introduced 

In the future to alk>w the front end to indicate character string expressbns whose lengths were guaranteed not to 

exceed 65535 characters, and which coukJ therefore be computed efficiently with the VAX character string instructions.) 

The length word of a varying-length string in memory will still be only 16 bits. Decimal strings are limited to 31 -digits 

(plus the sign digit) on all target architectures. 
^ An example of the details of the representatk>nal type system for the various target architectures is indicated in 

Tables. 

There Is a single Boolean data type, BOOL. This is the type of bgical values computed during the executkxi of a 
program. It does not have a specified physical representatkxi. For example, a Boolean value might be represented by 
the value of a binary integer, the value of a processor condition code, or the value of the processor program counter. 
^ In particular, type BOOL does not correspond to any logk:al or Boolean data types that may be present in a source 
language. These must be represented as INT or UINT values, and converted to and from type BOOL as necessary 

The general features that are common to all tuples In the intermediate language, and the structural characteristics 
of ILGs 55 (routines In the intermedete language) will now be described. 

An I LG 55 is made up of I L tuple nodes (usually just called tuples). All tuples contain the fiekjs listed In lable 7. 
50 other fields, known as attributes, occur only in particular kinds of tuples. 

Unlike symbol table nodes, which may be allocated with an arisitrary anrrount of space resented for use by the front 
end 20, CIL tuple nodes will contain only the fields specified here. EIL tuple nodes will contain additional fields, kxated 
at a negative offset from the tuple node address, which are private to the back end 12. 
Structure of the ILG 

55 One tuple in an ILG can refer to another tuple in two different ways: as an operand or as an attribute. When only 
the operator-operand relation Is conskJered, a CILG is directed acyclic graph (DAG), while an EILG is a forest (i.e.. a 
collection of trees). 

Attribute pointers 39 create additional structure on the ILG, and also altow references from the ILG to the symbol 
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table 30. The most important structural relation is the linear order of the ILG, defined by the next tuple and prev tuple 
attribute pointers. All of the tuples In a CILG occur in a single list defined by the linear order. The tuples of an EILG 
occur in a collection of circular lists, one for each basic block. 

The following mles apply to the stmcture of an ILG. If a front end 20 creates a CILG which violates these rules, 
s the results are unpredictable, although the back end will attempt, where convenient, to detect violations and terminate 
compilatk)n: 

(a) A tuple whose result type is NULi. is referred to as a statement tuple, and a tuple whose result type is not NULL 
is refen'ed to as an expression tuple. 
10 (b)lntheCIL: 

(i) A scalar or Boolean expression tuple may be an operand of one or more other tuples. An aggregate ex- 
pressbn tuple must be used as an operand of exactly one other tuple, whk:h must be in the same basic bkx:k 
(see bek>w). 

*5 (ii) An operand may be an expression tuple, a symbol node, or a literal node. 

(ill) A symbol node used as an operand always has type ADDR. A literal node used as an operand has the 
data type of the literal. 

(iv) A symbol representing a variable which is allocated to a register does not have an address, in the normal 
sense. However, such a symbol may be used as the address operand of a tuple which reads from or writes 

20 to memory (a FETCH or STORE), in which case the tuple will access the indicated register 

(v) If a symbol represents a variable in a stack frame, then that stack frame must be associated with the current 
routine or one of its ancestors In the symbol table block tree; otherwise, there woukl be no way of finding the 
stack frame at executk)n time. 

2S (c) In the EIL operands must be expression tuples, and every expressk>n tuple must be an operand of exactly one 

other tuple. 

(d) No statement tuple may be an operand of any other tuple. 

(e) A tuple whk:h is an operand of another tuple must precede that tuple in the linear ordering of the ILG. (In an 
EILG, this means that the operand and the operator must occur in the same baste block.) 

30 (f) An expression tuple must dominate every tuple which it is an operand of. That is, it must be impossible to get 

from an entry point of a routine to a tuple without encountering every operand of that tuple on the way. 

Subsequent paragraphs in this sectkxi describe the sorts of operations that are available in the intemnediate lan- 
guage and the operators that are used to represent them. The Individual operators are all collected In a data structure 

35 called <REFERENCE>(part_tupte_dk:tionary), the tuple dictionary. Each operator in the dictionary is documented using 
a structured format. Table 8 discusses the main categories in this format, the information presented under each, and 
the format used to present the information. 

The format section of a tuple specifies the number of operands and the altowed operator, operand, and result types 
In a single line of the fomi: 

40 op.type(type-1,...,type-n): result 

where op is the name of the tuple operator, and type specifies the allowable operator types. If '.type' is omitted, then 
the operator type must be NULL Othenwise, type must be on eof the folbwing: 

(a) A specific type name (ADDR, BOOL, BITS, lADDR, etc.) indicates that only the specified type is allowed. 
46 (b) I NT, Ul NT, REAL, CMPLX. or STR indicates that any type betonging to the specified family is legal. For example, 

CMPLX means that CMPLXF, CMPLXD, CMPLXG. CMPLXS, and CMPLXT are all allowed; STR means that STR8 
and STR1 6 are allowed. 

(c) ALL indicates that any type other than NULL is legal. 

(d) A string of the letters I, U, R, C, A, S. and B indicates that any type belonging to a family represented by one 
so of the letters is allowed, as folbws: 



ss 



I 
U 
R 
C 



INT 
UINT 
REAL 
CMPLX 



B 



A 
S 

BITS 



ADDR 
STR 
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The expressions Type-1 .....Type-n" specify the allowable types of the tuple's operands. If the parenthesized list is 
omitted, then the operator takes no operands. Otherwise, the tuple must have one operand for each type in the 
list. Each type-l must be one of the folbwing: 

(a) T means that the operand type must be the same as the operator type. 

(b) A specific type name (ADDR. BOOL. BITS,.IADDR, etc.) means that the operand must have the specified 
type. 

(c) A string of the type code letters I, U, R, C, A, S, and B has the same meaning that it does for the type 
specifier. Note that operands with the type specifier lU, which means "any integer," are generally converted 
to type I MAX in the generated code. Program behavior Is therefore undefined If the actual value of such an 
operand cannot be converted to type IMAX. 

(d) If the operator and operand type specifiers are REAL and CMPLX or STR and CHAR, then the actual 
operator and operand types must be consistent. For example, the type specification "CADD.CMPLXfT, RE- 
AL): T" Indicates that the second operand must have type REALF if the operator type is CMPLXR REALS if 
the operator type is CMPLXT, etc. If the operator type is SB, i.e. , character string or bit string, and an operand 
type specifier is CHAR, then the operand type must be CHAR8 if the operator type is STR8, CHAR16 if the 
operator type isSTR16, and IMAX if the operator type Is BITS. That Is, IMAX Is treated as the character type 
corresponding to the string type BITS. 

The actual operands of the tuple must be tuple nodes whose result types are consistent with the types specified 
by the operand type list. In the CIL, they may also be symbol nodes, which are always treated as having type 
ADDR, or literal nodes, which are treated as having the types specified by their data type fields. 

The expression "Result" specifies the allowable result types. If it Is omitted, then the operator is a statement operator 
and the tuple's result type must be NULL Otherwise, it is interpreted exactly the same way as the operand type spec- 
ifiers. 

Addresses and Memory References 

An address expression is one of the references in the intermediate language. The simplest form of address ex- 
pression is a symbol. That is, an operand field of a tuple node may contain the address of a symbol node, to represent 
the merrK>ry address (or the register) associated with that symbol. An address value can also be obtained by fetching 
it from memoiy (a "pointer variable"), by casting an arithmetic value, or by evaluating a preincrement tuple, a postin- 
crement tuple, or one of the tuples of the following list: 



Address Computation Operators 


Operator 


Meaning 


AMINUS 

APLUS 

BASEDREF 

LITADDR 

UPLINK 


Subtracts an integer from an address to yield a new address. 
Adds an integer to an address to yield a new address. 
Evaluates the address to yield a new address: 

Yields the address of a read-only memoiy location containing a specified literal value. 

Yields the address of the stack frame for the current routine or a routine that contains the cun-ent 

routine. 



A data access tuple Is a tuple which causes a value to be baded from or stored into menrwiy (The word "memoiy" 
here Includes registers in a register set of the target CPU 25. The only difference between a register and a normal 
memory k)cation of the CPU 25 is that the "address" of a register can only be used in a data access tuple.) The data 
access operators are listed in Table 9. 

In every data access tuple, the first operand is an address expression. Every data access tuple also has an offset 
attribute which contains a bngword integer. The address of the memory locatkxi to be accessed is the sum of the mn- 
time address operand and the compile-time constant offset attrbute. 

All data access tuples will have some or all of the attributes listed in Table 10. The uses of the effects, effects2, 
and base symbol attributes are discussed in more detail betow in the sectbn Interface for Representing Effects. 

Another type of reference is the Arr^y Reference. The APLUS and AMINUS tuples are sufficient for all address 
computations. However, they do not provkle any information about the meaning of an address computation. In partic- 
ular, they dont provide any infomiation about array references and subscript expressions that might have been present 
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in the source code. This information is needed for vetorization. Therefore, the IL has tuples which specifically describe 
array references. 

For examplei, given a BUSS vector declared as local X : vector[20,long], a reference to .X[. I] could be represented 

as 



$1: FETCH.INT32(I); 

$2: SUBSCRJADDR($1, [4], [0]; POSmON=l); 
$3: FETCH.INT32(X, $2); 



Given a Pascal array declared as var Y : packed array [1 ..10. 1..10] of 0..255. an assignment Y[l, J] := Z could be 
represented as 



$1: FETCHJNT32(J); 

$2: SUBSCRIADDR($1, [1], [0]; POSmON=l); 

$3: FETCH.INT32(I); 

$4 SUBSCR.IADDR($3, [10], $2; POSrnON=2); 

$5 FETCH.UINT8(Z); 

$6 STORE.UINT8($4.1 1, $5); 



The basic array reference operators are AREF and SUBSCR AREF yields the address of a specified element in 
2S an array. SUBSCR computes the offset of an array element. 

The first operand or an AREF tuple is an address expression representing the base address of the an^ay, and its 
second operand is a SUBSCR tuple which computes the byte offset from the base address to an element of the array 
The AREF tuple adds the value of the SUBSCR tuple to the base address to compute the address of the indexed 
element In fact, the code for AREF(origin, subscript) is Identical to the code for APLUS(origtn. subscript). 
^ A SUBSCR tuple computes the offset of an element along one dimensbn in an array. Its operands are: 

(a) The element index. Individual indices in a sut>script expression are not nomialized for a zero origin. Instead, 
an origin offset to account for non-zero lower bounds in the array declaration should be added into the address 
operand of the AREF tuple or the offset field of the tuple that uses the element address. 
35 (b) The stride. This is the difference between the addresses of consecutive elements along the dimension. For a 

simple vector of longwords, the stride would be a literal 4, but for multidimensional arrays, the "elements" of the 
higher dimensions rows (or larger cross-sections) of the array. 

(c) An expression for the remainder of the subscript expression (that is, for the remaining indices in the subscript 
expression). This must be either another SUBSCR expression or a literal node representing the integer constant 
• 40 zero. 



The code for SUBSCR(index, stride, remainder) is identical to the code for ADD(MUL(index, stride), remainder). 

A SUBSCR tuple also has a position attribute, which indicates the position of the index in the subscript list of the 
array reference. It is required that a position number identify the same subscript position in all references to a given 
^ array. For the most effective vectorization, it is recommended that position 1 ought to be the most rapidly varying 
sut)script, position 2 the next most rapidly varying, etc. 

There are several tuple operators that doni really fit in any other section; These miscellaneous operators are the 
following: 



Operator 


Meaning 


ADIFF 

DEFINES 

VOID 


Computes the integer difference between two addresses. 

Encodes side effects or dependencies in the ILG without causing any code to be generated. 
Causes an expression to be evaluated but discards it value. 



Arithmetic Tuples: 

The arithmetic tuples are used to manipulate "arithmetic" values - integers, real numbers, and complex numbers. 
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This includes fetching, storing, and conversions, as well as traditional arithmetic operations such as addition and mul- 
tiplication. 

The shift instructions in the VAX and RISC architectures are so different from one another that a fully abstract IL 
shift operator would be certain to generate Inefficient code on one or both architectures. On the other hand, the IL has 
to support shifting, since many source languages have some sort of shift operators. As a compromise, the IL provides 
the following operators (None of the shift operators will ever cause an arithmetic overflow exception.): 

(a) SHL. SHR, and SHRA do a left shift, a logical right shift, and an arithmetic right shift, respectively and require 
a positive shift count. (That is, their behavior is undefined if the shift count is negative.) These support the C shift 
operators, and map directly into the RISC architecture shift instructions. 

(b) SH does a left shift if its operand is positive, or an arithmetic right shift if its operand is negative. This supports 
the BLISS shift operator, and maps directly into the VAX shift instruction. 

(c) ROT is the rotate operator. Although it is described differently in the VAX and RISC architectures, the actual 
behavior in all cases can be characterized as a left rotation whose count is specified by the least significant n bits 
of the count operand, where n is the base-two logarithm of the register size. (For example, on VAX and MIPS the 
rotate count is the least significant five bits of the count operand.) 

Integer overflow is another feature to consider. There is a problem in attempting to specify the sizes for integer 
arithmetic in the IL so that, for all target machines, code will be generated that will satisfy the sennantics of the source 
language and will be as efficient as possible subject to the constraints imposed by those semantics. In particular, some 
machines (such as VAX) will happily do byte and word arithmetic, while RISC nnachines typically do only longword 
arithmetic. Doing all the size conversions would be wasteful on a VAX. but emulating true byte or word arithmetic would 
be inefficient on a RISC machine. 

The folbwing rules are intended to allow the code generator sufficient flexibility to generate reasonable code for 
all target machines (Everything that is said about INT types below applies equally to UINT types.): 

(a) If the result type of an expression is INT* -^ the compiler nr«y actually perfomn the indicated computation with 
y-bit arithmetic, where y^. This might produce a y-bit result with nrtore than x significant bits, if the original x-bit 
computation would have overflowed. For example, an ADD.INT16 might be implemented with a 32-bit add. 20000 
+ 30000 results in an overflow when done as a 1 6-bit add, but produces the legal 32-blt number 50000 when done 
as a 32-bit add. 

(b) Every arithmetic operator has a suppress overflow flag (which is only meaningful when the tuple result type is 
INT or UINT). If this flag is set, then the code generated for a tuple must not report any sort of overflow condition, 
regardless of the results of the computation, and nnay ignore the possible presence of extraneous highK>rder bits 
in the result (except when the result is used as the operand of an XCVT tuple). Note that the suppress overflow 
flag is defined in tuples (such as lAND) for which overflow could never occur anyway Suppressing overflow for 
these tuples will be particularly easy. The suppress overflow flag is intended for situations where it would be se- 
mantically incorrect for an operation to overflow. It may result in more costly code on some architectures. On VAX, 
for example, extra code is required to suppress overflow detection. Therefore, if it is immaterial whether an oper- 
ation overflows, or if the front end knows that a particular operation can never overflow, then this flag should be 
cleared to allow the compiler to generate the mo^ efficient code. 

(c) The routine block node has a detect overflow flag. If this flag is clear, then the back end is not required to 
generate code to detect overflows in integer arithmetb operations. It is free, however, to generate code that will 
detect overflows if this is more efficient ~ mandatory suppresskxi of overflow detection can be accomplished only 
by setting the suppress overflow flag in a particular tuple. 

(d) If the detect overflow flag is set in the routine block node, then the generated code must guarantee, for each 
expression tree, that either the result computed for that expression is valid, or an integer overflow exception is 
signalled. This is not a requirement that overflow be detected in every possible subexpression of an expression. 
For example, suppose that A, B, C, and X are 16-bit variables, and that A is 32767 and B and C are 1. In the 
assignment X := A + B - C. the generated code might compute A + B - C using 32-bit arithmetic and then check 
whether the result is a 16-bit result before storing it This woukJ store the correct answer 32767. even though the 
same expression, if computed with 16-bit arithmetic, would result in an integer overflow error. The assignment X : 
= A + B, on the other hand, would compute the value 32768 correctly, but would then generate an overflow exception 
when it attempted to store it into X. The collection of places where overflows must be detected is not clear, but 
certainly includes right-hand sides of stores and arguments in routine calls. 

(e) Notfce also the XCVT corwersfon operator, which returns the value of its operand, forcing any extraneous high- 
order bits of the representation to be consistent with the sign of the actual operand. For example, if E is a UINT8 
expressbn which is evaluated using 32-bit arithmetic, then XCVTU1NTB(E : 1NT16) will be a 16-bit integer whose 
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high-order 8 bits are guaranteed to be zero. In general, if E Is an expression of type X then XCVT.T(E : T) can be 
used to force the representation of a value to be consistent with Its nominal size. 

(f) If the representation of an integer operand In some expression contains high-order significant bits beyond the 
nominal size of the operand, then the generated code Is free to use either the full represented value or the value 
s at the nominal size. When this is not acceptable, the front end must generate an XCVT tuple to discard unwanted 

high-order bits from the representation. 

There is not any mechanism In the I L to disable the detection of floating-point overflow exceptions. A floating-point 
overflow will always result in the signalling of an exception. The signalling of floating-point underflow is controlled only 
10 at the routine level. Ftoutine block nodes have a detect underflow flag. If it is set, the compiler Is required to generate 
code which will detect and report any floatingixjint underflows which occur in that routine; othenwise, the generated 
code must Ignore floating-point underflows. 

The conversion operators will compute a value of one arithmetic type that is related to a value of another arithmetic 
type. TTie ROUND and TRUNC operators for reaMo-integer conversions, the CMPLX operator for real4o-complex 
IS conversions, andthe REALand IMAG operators forcomplex4o-real conversions are all familiar. (ROUND and TRUNC 
are also defined with a real result type.) 

CVT is the general purpose conversion operator. It will do conversions between any two arithmetic types. It is 
important to be aware, though, that the only conversions that are done directly are UNIT-INT. INT-REAL, and REAL- 
CMPLX (and of course conversions within a type, such as INT16-INT32). This means, for example, that a CMPLXG- 
20 to-UINT16 conversion will actually be done as the series of conversions CiVIPLXG^o-REALG. REALG-to-INT32. 
INT32-to-UINT16. This Is not the behavior of VAX Pascal, which has direct real-to-unsigned conversions. 

XCVT is a special operator which deals only with integer types. Like CVT. it yields the value of its result type which 
is arithmetically equal to its operand. However, it has the special feature that it will first change the high-order bits of 
the representatkm of the operand so that the operand's representation is arithmetk:ally equal to its value. 
2S For example, consider the expression 

XCVT{ADD.UINT8([UINT8=255MUINT8=2]): INT16). 
If the expression is computed with 32-bit arithmetic, then the result off the ADD might be a register containing 
%X0CXXX)101 (257). The XCVT would then discard the high-order bits, leaving %X00000001 (1 ). which would already 
be a valki 16-bit signed Integer. 
30 CAST is not really a conversion operator, since it deals with bit patterns, not values. A CAST tuple yields the value 

of its result type whk:h has the same bit pattern as its operand (truncating or concatenating zero bits if necessary). 

Another type is Vaiiable l^todification Operators. The operators with names of the form OPMOD. where OP is ADD, 
I AND. etc. , all have an address operand and a value operand. They fetch an arithmetic value from the specified address, 
perform the indicated operation between it and the value operand, and store the result back at the same address. They 
3S also yield the computed value. They are intended to implement C;s op= operators. For example, the code sequence 



$1: ADDMOD.REALF(X, [%F0.1]); 
$2: STOREJlEALF(Y, $1); 



will have the same effect as 



^ $1: FETCH JIEALF(X); 

$2: ADDJmALF($l, [%F0.1]); 

$3: STOREJflEALF(X, $2); 

$4: STORE JIEALFCY, $2); 



These operators also have OPMODA and OPMODX forms, which fetch, update, and replace a value In a packed array 
element or a bit field. 

The PREINCR. PREINCRA, and PREINCRX operators are essentially the same as ADDMOD. ADDMODA, and 
ADDMODX, except that instead of a value operand, they have an attribute fiekJ containing a compile-time constant 
increment value. They can be applied to addresses (pointer variables) as well as arithmetk: variables. They are intended 
to implement C*s preincrement and predecrement operators. 

The POSTINCR, POSTINCRA, and POSTINCRX operators are the same as the PREINCR, and PREINCRX tu- 
ples, except that the value of the tuple is the value that the memory location heM before it was updated, rather than 
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the value that was stored back into It. They are intended to implement C's postincrement and postdecrement operators. 
Strings: 

The string (or aggregate) types of the compiler are types whose values are sequences of values from a base type. 
These types are: 

STR8, a sequence of eight-bit characters (type CHAR8). 

STR16. a sequence of sixteen-brt characters (type CHAR16). 

BITS, a sequence of single bits. 

DECIMAL, a sequence of decimal digits and an associated precision. 

The elements in a character or bit string sequence are numbered from 0 to n - 1 , where n is the string length. If 
an eight-bit character string is represented in memory at address A, then the byte at address A contains the first 
character of the string, the byte at address A + 1 contains the second character of the string, and so on through the 
byte at address A + n - 1 , which contains the last character of the string. If a sixteen-bit character string is represented 
in memory at address A, then the word at address A contains the first character of the string, the word at address A + 
2 contains the second character of the string, and so on through the word at address A + 2(n - 1), which contains the 
last character of the string. If a bit string Is represented in memory at address A. then the first eight bits of the string 
are the least significant through the most significant bits of the byte at address A + 1 . etc. 

Aggregate values In general must be represented somewhere in memory, unlike scalar values which can occur in 
registers, or even as literal operands in machine instnjctk>ns. However, the semantic model of the intermediate lan- 
guage is that strings can be fetched, manipulated, and stored just like scalars. The compiler is responsible for allocating 
temporaries to hoki intermediate string values. 

Note that the code generated for string operations must be consistent with this model, even when there is overlap 
between the operands. For example, the IL statement ST0REF.STR8(A+1 ,[20], FETCHF.STR8(A.[20]) moves a twenty 
character string up one position in memory. It must not simply make twenty copies of the character at A. 

A string is saWto be empty if its length is zero. Let head be a function that returns the first element of a non-empty 
string, tail be a function that returns the string containing all elements except the first of a non-empty string, and empty 
be a function that Is true if a string is empty and false othenwise. Then the relation between two strings X and Y. as 
tested by the standard comparison operators (EQL, NEQ, LSS. LEQ, GTR. GEQ). is defined as follows: 
If empty(X)Aempty(Y) then X = Y 
If empty(X)v-.empty(Y) then X < Y 
If -Tempty(X)Aempty(Y) then X > Y 
If -iempty(X)A--iempty(Y)Ahead(X) < head(Y) then X < Y 
If -iempty(X)A-«mpty(Y)Ahead(X) > head(Y) then X > Y 
If -iempty(X)A-,empty(Y>Ahead(X) = head(Y) then rel(X,Y) = rel(tail(X), tail(Y)). 
The string comparison operators in some languages (such ^ Pascal) operate only on equaHength strings.padding 
the shorter string in a comparison to the length of the tonger string. Therefore, the IL also has padded string comparison 
operators, EQLP, NEQP. LSSP. LEQP, GTRP, and GEQP 
All of the string operators are listed in Table 12. 

Booleans: 

Unlike the representatbnal data types, the Boolean data type does not have a unique representatbn. During 
program execution, Boolean values may be represented explrcitly by the value of some bit in a binary integer, or implicitly 
by the particular code path that is taken. Since there is no unique representation, it is not possible to have Boolean 
variables in the IL. However, most source languages provide for the k>gk:al interpretation of representational values, 
and many allow the declaration of logical or Boolean variables. Therefore, operators are needed to convert between 
Boolean values and their source language binary representations. 

The LBSET operator interprets an integer as a Boolean by testing its least significant bit, and the NONZERO 
operator interprets an integer as a Boolean by testing whether the whole integer is zero or not. The LSBIT operator 
represents a Boolean value as an integer with the bit pattern <00 ... 00> or <00 ... 01 >, and the ALLBITS operator 
represents a Boolean value as an integer with the bit pattern <00 ... 00> or <11 ... 11>. These operators support the 
binary representation of Boolean values in the various source languages as follows: 
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10 



IS 



20 



Source Language 


Binary to Boolean 


&x>lean to Binary 


Ada 


LBSET 


LSBIT 


BLISS 


LBSET 


LSBIT 


C 


NONZERO 


LSBIT 


FORTRAN 


LBSET 


ALLBITS 


Pascal 


LBSET 


LSBIT 



Even though Boolean values do not have a representation, and therefore cannot be represented with nomnal literal 
nodes, it is very desirable to be able to apply all the regular IL transformations to Boolean expressions. Therefore, the 
back end 12 provides two special literal nodes, whose addresses are contained in the global variables 
GEM$ST_G_TRUE and GEM$ST_G_FALSE. These literal nodes cannot be used for static storage initialization, but 
they can be used as operands In an ILG. 

Boolean expressions involving AND and OR operators can be evaluated in two different ways, full evaluation and 
flow or short-circuit evaluation. In full evaluation, both operands are fully evaluated, yielding real vno6e values, which 
are then used as operands to an AND or OR instruction to yield a real mode result. In flow or short^ircuit evaluation, 
the first operand is evaluated. If the value of the expression is determined by the value of the first operand, then the 
second operand is skipped; otherwise, the second operand is evaluated and the value of the expressbn is the value 
of the second operand. 

Some source languages require full evaluation of AND and OR expressions; others require (or have special op- 
erators for) short-circuit evaluatkxi; and still others do not specify the kind of evaluation, leaving the choice to the 
compiler. Three sets of operators are provided for these three cases: 
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(a) LANDC and LORC ("Logical AND Conditional" and "Logical OR Conditional") are the flow Boolean operators. 
They evaluate their first operands and then may bypass the evaluatbn of their second operands. 

(b) LANDU and LORU ("Logrcal AND Unconditional" and "Logical OR Unconditional") are the full evaluation 
Boolean operators. They behave like nonmal binary operators, computing a result value from two fully evaluated 
operand expresskxis. 

(c) LAND and LOR ("Logfcal AND" and "Logk:al OR") are CIL operators which do not specify either the kind of 
evaluatkxi or the order of the operands. During IL expanskxi, they may be replaced either by LANDC and LORC 
or by LANDU and LORU tuples. Furthermore, when they are replaced by LANDC and LORC tuples, their operands 
may be interchanged if the cost of evaluating their first operands appears to be greater than the cost of evaluating 
their second operands. 

The back end 1 2 must be able to identify the tuples belonging to each operand of a LAND, LOR, LANDC, or LORC 
tuple. In the CIL, the FLOWMARK tuple is used for this purpose. All of the tuples associated with the first operand of 
one of these tuples must immediately precede all of the tuples associated with the second operand, which must im- 
mediately precede the Boolean operator tuple itself. The first tuple associated with any operand of one of these tuples 
must be immediately preceded by a FLOWMARK tuple. 

For example. 



46 



so 



$1 
$2: 
$3 
$4 
$5 
$6: 
$7 



FLOWMARK; 
FETCHCX); 
GTR($2, [0]); 
FLOWMARK; 
FETCHCX); 
LSS($5. [10]); 
LAND($3, $6); 



! Stan of first operand 



! Start of second operand 



! Operator tuple 



The selection operators will select one of two values of any type, depending on the value of a Boolean operand. 
Like the bgical OR and AND tuples, there are three selection tuples: 

t 

(a) SELC will evaluate only its second or its third operand, depending on whether its first operand is tme or false. 
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(b) SELL) will always evaluate all three of its operands, arid then will select the value of either its second or third 
operand. 

(c) SEL is a CiL operator which does not specify the kind of evaluation. It is replaced by either a SELC or a SELU 
operator during IL expansion. 

Also like the logical AND and OR tuples, SEL and SELC require that the tuples associated with their operands be 
contiguous, In operand order, and preceded with FLOWMARK tuples. 
For example 



10 



IS 



20 



$1 

$2: 
$3 
$4 

$5; 
$6 
$7; 
$8 
$9; 



FLOWMARK; 

FETCHQC); 

GEQ(2, [0]); 

FLX)WMARK; 

¥ETCRQCi\ 

FLOWMARK; 

FETCHCX); 

NEG($7); 

SEM$3, $5, $8); 



! Stan of fust opeiand 

! Stan of second operand 
! Stan of tfaiid opeiand 

! Operator tuple 



or 
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$1: 


FLOWMARK; 


$2: 


FETCHCX); 


$3: 


GEQ($2. [0]); 


$4: 


FLOWMARK; 


$5: 


FLOWMARK; 


$6: 


FETCHCX); 


$7: 


SEU$3, [0], $6); 



! Stan of fizst operand 



! There is no code for the second 
operand 

! Stan of third operand 

! Operator tuple — note the second 
operand 



All of the Boolean operators are listed in Table 13. 
^ Runtime Checking: 

Checking operators verify that some condition is true during the execution of the program, and cause an exception 
if the condition is not true. Except for the ASSERT operator, all of the checking operators return the value of their first 
operand. Every checking tuple has a condition fieW, which specifies the exceptkxi to be signalled if the condition is not 
45 true, and a can be continued field, whkih indicates whether control might be retumed after the exceptbn is signalled. 
If control returns to a checking tuple after an exception, then the checking tuple will return the same value that It woukJ 
have retumed if the exception had not occurred. The checking operators are listed in Table 14. 



Fk>w Control: 

An ILG 55 is made up off basic blocks. A basic bkx;k is a sequence of tuples beginning with a branch target tuple 
and ending with a branch tuple or a flow termination tuple. A basic block is entered only at its beginning, and in principle 
all code in it is then executed before control passes out of it at its end (but see the discussbn of conditional evaluation 
above). 

In a CILG, the basic bk)cks are concatenated end to end. The branch tuple at the end of a basic block may be 
omitted if control ftows from it Into following basic block, which must begin with a LABEL tuple. Similarty, the LABEL 
tuple at the beginning of a basic block nnay be omitted if there are no branches to it. (That is, if the back end sees a 
LABEL tuple which is not preceded by a branch tuple, then it inserts a BRANCH to it; if it sees a branch tuple which is 



30 



EP0^049B1 



not foltowed by a branch target tuple, then it inserts a LABEL tuple with a synthesized label synnbol.) The IL expansion 
phase produces a circular list of tuples for each basic block, with a separate flow graph data structure to represent the 
relations between them. 

Within a basic block, flow innplicitly folbws the linear tuple ordering. Because all flow between basic bkxks Is . 
s represented with explicitly flow control tuples, the basic bk>cks of an I LG may be arranged in any order without affecting 
the meaning of a routine. 

The branch target tuple at the beginning of each basic block contains a pointer to a label symbol or entry symbol 
node in the symbol table. Control flow between basic btocks is represented by a destinatkxi list which is an attribute 
of a branch tuple. Each node in a destinatbn list points to a label symbol or entry symbol node which is also pointed 
10 to by some branch target tuple In the same routine, to indicate that control might be transferred to the basic bkwk that 
begins with that basic block. 

A branch target tuple marks the start of a basic block. Alt branch target tuples have the foltowing attributes: 



Attribute 


Meaning 


Bkxk entry 
Label symbol 
Scope bbck 
Volatile 


A flag indbating whether this is the entry basic block of Its scope. 

A pointer to the label or entry symbol node which is associated with this tuple. 

A pointer to a block node in the symbol table. 

A flag indicating that control can reach this bask; bkx:k by some control transfer (such as a non- 
local goto) whk:h is not represented in the ILG for this routine. 


A branch tuple marks the end of a basic block and specifies its successors. All branch tuples have the foltowing 
attributes: 


Attribute 


Meaning 


DestinatkHi list 
Target symbol 


A pointer to the destinatfon list for the branch. 

A pointer to a symtx)! node. This field is used in only a few branch operators, and has a different 
meaning in each one, but it wilt always either be null or contain a pointer to a label symbol node. 
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A destination list is a list of destination nodes, linked together by their next fiekis. The destination list field of a 
branch tuple contains a pointer to the first destination node in such a list. (Note that a destination node can occur in 
only one destination list, and a destination list can be pointed to by only one branch tuple. Even if two branches have 
the same destinations, they still must have distinct, albeit identical, destination lists.) Every destinatkxi node has a 

35 target field. whk:h contains a pointer to a label or entry symbol node. A destinatbn node represents a potential transfer 
of control to the basic block whose branch target tuple's label symbol fleki contains a pointer to the same symbol node. 
There are two kinds of destination nodes. Most kinds of branch tuples use simple destination nodes, and choose a 
destination based on its positbn in the destinatbn list. BRSEL tuples, however, use selector destination nodes, and 
choose the destination whose selector matches the tuple's operand value. A selector destinatbn node has additional 

40 fields low test and high test, both bngword integers. It nnatches an operand value if the operand value falls between 
the destinatkxi's low te^ and high test values. 

Unlike the regular branch operators, which specify a set of destinatbns with a destination list and then select one 
of them based on a tuple operand, the indirect branch operators (JUMP and JUMPLOCAL) cause control to be trans- 
ferred to the address specified by an address expressbn (usually a label variable). These would be the operators used 

45 for a FOFTTRAN assigned goto or a PL/I got through a label variable. 

The back end still needs to know the possible destinatbns of an indirect branch tuple so that it can build the routine 
flow graph correctly. Therefore, indirect branch tuples have a destination list, just like regular branch operators. How- 
ever, their destinatk>n list contains only a single destination (which is optional for JUMP tuples). The target label of this 
destination node identifies a VLABEL tuple which is immediately followed by a VBRANCH tuple. The destinatbn list 

so of the VBRANCH tuple then lists all of the actual possible destinations in this routine of the indirect branch. 

This combination of a VLABEL tuple and a VBRANCH tuple is referred to as a virtual basic block. No code is ever 
generated for it (which is why there must not be any other tuples between the VLABEL and the VBRANCH). It represents 
the fact that control can pass from the indirect branch to any of the successors of the virtual block. This has the ad- 
vantage that if many indirect branches have the same set of possible destinations, a single virtual basb bbck can 

ss represent the possible destinations of all of them. 

There is one other virtual basb block in every routine. This is the bbck which consists of the BEGIN and ENTR/PTR 
tuples. No code is generated for it. since execution always begins at an ENTRT tuple, but it bentifies all the entry 
points of the routine for the back end. 
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A basic block may end with a branch tuple or with a flow termination tuple. When control reaches a flow termination 
tuple, it leaves the current routine completely. Since flow termination tuples do not transfer control to a destination in 
the current routine, they do not have destination list and target symbol attributes. 

Note that the JUMP operator is effectively a flow termination operator if It does not have a destination list, since 
that means that it does not have any possible destinations in the current routine. JUMPSYMBOL is a flow termination 
operator which is used to represent a non-local goto to a known label in the CIL; in the EIL it is replaced by such a 
non-local JUMP. 

All of the flow control operators are listed in Table 15. 

Routine Calls and Parameter Passing: 

There are three types of linkage conventions: control, parameter, return value. The phrase "linkage conventions' 
refers to all the rules about the generated code which allow a calling routine and a called routine to "talk to each other" 
properly. Some of these rules are built In to the code generator 29. In other cases there are choices, which must be 
made consistently for a calling and called routine. Some of these choices will be made by the shell (when it has access 
to both routines); others must be made by the front end 20, and encoded in the symbol table 30 and ILG 55. 

A control linkage conventbn defines the instructions which must be executed to pass control from a calling to a 
called routine, to establish the executbn context of the called routine, and to retum control to the calling routine. Control 
linkage conventions are determined by the INITCALL and CALL tuples in the calling routine and the entry symbol node 
for the called routine. 

A CALL tuple whose operand Is a reference to an entry symbol node which isnl an external reference is an identified 
call, and there is complete freedom to select the linkage for it, even to the extent of compiling the called routine in line 
or generating a customized copy of the called routine. For unidentified calls, the calling convention field of the INITCALL 
tuple must specify the control linkage conventkxi to use for the call. The value of this fieM must come from the enu- 
merated type GEM$CALLING_C0NVENT10N, those constants are defined in the foltowing list 



Constant 


Meaning 


Standard 

Call 
Jsb 


Use the standard external call conventions for the target system. (This is the only calling conventbn 
defined for the MIPS implementation.) 
Use a CALL linkage (VAX only). 
Use a JSB linkage (VAX only). 


A routine block node has a standard entry field which specifies what control linkage convention to use for the copy 
of this routine that will be called by unidentified calls to this routine. The value of this fiekj must come from the enu- 
merated type GEM$ENTRY„CONVEIsmON, whose constants are defined in the following list: 


Constant 


Meaning 


None 

Standard 

Call 
Jsb 


All calls to the routine are Identified calls in the current compilation, so it is unnecessary to generate 
an instance of the routine to be called from unidentified calls. 

Generate a routine that can be called using the standard entry conventk>n. (This is the only calling 
convention defined for the MIPS implementatbn.) 
Use a CALL linkage (VAX only). 
Use a JSB linkage (VAX only). 



Parameter Linkage Conventions are another type. A routine call makes an argument list available to the called 
routine. The argument list is a collection of scalar values (often addresses) in locations whbh are known to both the 
calling and the called routine by agreement (registers, or kxatkxi sin a bkwk of memory whose address Is contained 
in some standard register). 

A formal parameter of a called routine Is represented by a variable symbol node whose is a parameter flag set. 
The address associated with a parameter symbol is either a storage location specified by the calling routine or a kx^al 
storage location which contains a copy of the data passed by the calling routine. (Remember that an "address" may 
actually specify a register) It is derived from the argument list and from the mechanism and the semantfc flags of the 
parameter symbol, as described bek>w. 

A parameter has bind semantics if the address associated with the parameter variable is the address of the storage 
location which was passed by the calling routine (the actual storage kx:ation). It has copy semantics if the compiler 
albcates storage for it in the called routine (the local storage location) and generates copies between the actual and 
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local storage locations as needed. (The local storage location of a parameter with bind semantics is the same as its 
actual storage location.) 

The compiler will choose whether to use bind or copy semantics for a parameter based on the usage pattem of 
the parameter within the routine and on the flags listed in Table 10-3. ("Alias effects' are discussed in CT0.70, Data 
Access Model. Briefly, they are ways that the actual storage kx»tion might be accessed, other than through the pa- 
rameter symbol. This includes direct reference to a non-local variable which might be the actual storage location, 
dereference effects, and calls to other routines which might access the actual storage location.) 

Table 1 7 illustrates the use of the parameter semantic flags as they would be set for various source languages. 

A parameter mechanism specifies the relationship between what the calling routine wishes to pass to the called 
routine and what is actually stored in the argument list. A parameter symbol has a mechanism field which specifies 
the mechanism which is used to pass a value to this parameter, and an argument tuple has a machanism field which 
specifies the mechanism by which this argument is to be passed. The values of these fields must come from the 
enumerated type GEM$MECHANISM, whose constants are listed in Table 16. 

If a parameter variable's unknown size flag is false, then the size of the parameter is known at compile time, and 
is specified by its size field. If unknown size is true, then the size of the parameter is not known at compile time. The 
size of an unknown size parameter can be determined at run time if it has the array, string, or address and length 
(reference with associated length parameter) mechanism. When a separate length word is passed with the address 
and length mechanism, and the parameter has an aggregate data type, the length argument is interpreted as the 
parameter size in elements (bits or characters), not in bytes. Furthermore, if the parameter is a character string whose 
string representation is varying or asciz, then the size is a maximum size, not the string's current size, and applies only 
to the test part of the string, and not to the space that is required for the string length word or null tenminator Note that 
a parameter cannot have copy semantbs unless the compiler know how much to copy If the actual parameter size is 
neither known at compile time nor computable by the compiler at run time, then the front end must set the parameter's 
must bind flag to force the use of bind semantics. 

Another type is Return Value Linkage Conventions. A called routine can return infomnatkyi to its caller in two ways. 
The first is by using an output parameter. This is a variable which is passed with a mechanism other than value, so 
that the called routine can store a value into it. The second way is with a return value. A retum value Is a value which 
is computed by the called routine and Vetumed' to the caller, where it becomes available as an expressbn value 
through a special result tuple. 

Scalar values can be returned in registers. For example, almost all of our languages retum arithmetic function 
values in a standard register, and the BLISS "output parameter" feature allows a routine to return values in arbitrary 
registers. 

For a routine to retum a string, there must be tuples in the argument list to alkx:ate a temporary buffer for the retum 
value and to pass its address to the called routine, tuples in the called routine to store the retum value into the buffer, 
and tuples in the caller to make the retrieve the value from the buffer. 

When the size of a returned string is determined by the called routine, the caller cannot just allocate space for the 
result, since it does not know in advance how big the result will be. The mechanisms listed in Table 1 9 provkJe for this 
possibility. These mechanisms are provided through special tuples. However, their availability depends on the calling 
standard of the target environment. 

The caller may: (a) require that the called routine retum a value by fixed buffer; (b) require tiiat the called routine 
retum a value on the stack, (c) request that the called routine, retum a value by dynam» string, but accept a string 
returned on the stack if the called routine so chooses. The called routine must always be prepared to return a dynamic- 
size result by fixed buffer or on the stack if the caller requires it. It must also be prepared to retum a result either by 
dynamic string or on the stack when the caller requests a result by dynamk: string. 

Representatkxi of routine calls in the CIL will now be considered. There are nnany distinct operations involved in 
calling a procedure or function. Any of tiie following steps may be necessary.: 

(a) Allocate space for the argument list. 

(b) Allocate space for pass-by-value operand expressk)n. 

(c) Allocate space for descriptors. 

(d) Create argument descriptors. 

(e) Create argument descriptors. ' 

(f) Alk)cate space for result values. (A result value, or output argument, is an argument which does not exist until 
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after the call. In the IL, a function will be treated as a procedure with a result value.) 
(g) Create the argument list. 
s (h) Call the routine. 

(i) Release space that was allocated for arguments, descriptors, and the argument list, 
(j) Get the result values from the call. 

10 

(k) Free the space that was allocated for the result values. 

The general strategy taken In the I L is to provide separate operators for the different operations involved in doing 
a call, but to require that these be tied together in a specified fashbn. A routine call In the IL consists of: 

IS 

1 . An INITCALL statement, which flags the beginning of the series of actions which will make up the call. 

2. A series of argument and temporary alkx^ation statements whk:h will construct the argument list. 

20 a A call statement (CALL or BPCALL) which actually effects the transfer of control to the called routine. 
4. A series of result tuples which make the call's retum values accessible. 

The I NITCALL and call statements are mandatory; the argument list and result tuples are optional. All of the tuples 
2S involved in a call must occur in the same basic bk>ck, and any result tuples must follow the call tuple immediately, with 
no intervening tuples. There are no other restrictk>ns, though, on what tuples may occur between the INITCALL and 
the call. The I L for a routine call may even be contained within the argument list IL for another call. 

Constructing the argument list involves alkx^ating space for the argument list itself, for addresses and descriptors 
of arguments, for temporaries to hoM values being passed, and for output arguments. It may also involve initializing 
30 the allocated space. These activities are specified in the IL with argument tuples. All argument tuples have names 
beginning with ARG. and have the attributes listed in Table 20. 

When the calling routine'has a value to pass, it uses one of the argument tuples whose names begin with ARG VAL. 
With these tuples, the actual argument value is specified as an operand of the argument tuple. Note that this does not 
necessarily mean that the argument is passed using the value mechanism. If the mechanism Is value, the operand 
3S value Is stored directly into the argument list; otherwise, a temporary is alk)cated, the operand value Is stored into the 
temporary, and the temporary is passed by reference or descriptor. (This is like %REF in BUSS.) The value mechanism 
will only be supported with the ARGVAL tuple with scalar types, and with the ARGVALA tuple with a compile-time 
constant size. 

When the calling routine has the address of an existing store location to pass, it uses on eof the argument tuples 
^ whose names begin with ARGADR. With these tuples, the address of the actual storage locatk)n Is specified as an 
operand of the argument tuple. Thus, the value mechanism cannot be used with these tuples. Since the occurrence 
of one of these tuples in an argument list can cause the called routine to read from or write to a storage location known 
to the current routine, these tuples can have dependencies and side effects, and therefore have the offset, effects. 
effects2, and base symbol fiekJs that are used in all memory reference tuples, as well as the special flags parm is read 
45 and parm is written, which indicate whether the compiler should assume that the called routine might read from and/ 
or write to the storage k)catk>n. 

When an argument tuple specifies the general mechanism, a code is generated to alkx:ate space for the descriptor 
and to fill in its base address fieki. The front end must explicitly specify any other fields that are to be initialized in the 
descriptor. It does this using DSCFIELD tuples, which refer back to a preceding argument tuple with the general mech- 
so anism and specify a value to be stored into a field in the descriptor that was allocated for that argument. 

Constructing an Argument Bkx:k: 

Some RTL linkages may require that a collection of arguments be passed in an argument block, whose address 

5S is passed to the RTL routine like an ordinary reference parameter. This is accomplished using three special tuples. 

t 

(a) ARGBLOCK is an argument tuple whk:h allocates a block of a specified size on the stack and passes its address 
to the called routine. The block can be initialized using BLKFIELD tuples. 
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(b) A BLKFIELD tuple is like a DSCFIELD tuple, except that it refers back to a preceding ARGBLOCK tuple instead 
of to an arbitrary tuple with the general descriptor mechanism. It stores a value into a field of the argument bk)ck. 

(c) ARGDEFINES is like an argument tuple, except that it doesnl generate any code. It allows the front end to 
specify argument-like side effects whk^h are not associated with a regular argument tuple. In particular, it can be 
used to indcate the effects associated with arguments which have been passed through an argument bkxk. 

For a routine to return an aggregate value, it must store that value into a bcation that has been allocated by its 
caller. The tuples whose names begin with ARGTMP will allocate a block of storage of a specified size and pass its 
address to a called routine. They are the same as the ARGADR tuples, except that the ARGADR tuples pass the 
address of an existing block of storage, and the ARGTMP tuples pass the address of a temporary that has been 
alkx^ated especially for the call. 

The ARGBUF, ARGSTK, and ARGDYN tuples will allocate the temporaries and pass the special descriptors nec- 
essary to obtain a dynamic string return value. These tuples have all the usual argument tuple attributes, but their 
mechanism attribute is ignored, since the mechanism is Implied by the use of the dynamic return value mechanism. 

The tuples whose names begin with RESULT will make the return values from a routine call accessible in the 
calling routine. Their effect is to move the output parameters from the temporary locations or registers where they have 
been retumed by the called routine into nrx^re lasting temporaries. The value of a result tuple is simply the value of the 
return value that it has retrieved. All the result tuples for a call must immediately follow the call tuple. 

Bound Procedure Calls: 

A bound procedure value, or BPV. represents the informatkm needed to call an unknown routine. Since routines 
may contain uplevel references to stack alkx:ated variables \n other routines, a bound procedure value must incorporate 
not only the code address of the routine to be called, but also suffteient informatton to construct a static link for it. 

Unfortunately, BPVs are handled very differently in different software architectures- how they are created, how 
they are represented, how they are called, and even how big they are. Therefore, the compiler will not attempt to 
provide a consistent representatkm. Instead, the front end will be expected to generate differing code, depending on 
the target software architecture. 

(a) In the VAX and MIPS software architectures, a BPV is simply a code address and a context value, and a bound 
procedure call is done by loading the context value into a specific register and then doing a call to the code address. 
Therefore, the front end will be responsible for representing a BPV as a pair of independent address values. The 
code address is obtained with a BPLINK tuple. A call to a BPV shouki be represented as a CALL whose address 
operand is the code address value, with the context value passed by value as a special register argument in the 
architecture's statk: link register. 

(b) On RISC machines as referred to, all procedures are represented by descriptors whk:h contain a code address 
along with some additional information, and a BPV is simply the address of a special descriptor, constructed at 
run time, whbh contains a context pointer and the address of an RTL routine to load the cont^ pointer and call 
the real routine. Ihe front end will have to allocate space for such a descriptor itself, and use the BPVAL tuple to 
fill it in. Then the BPV is represented by the address of the descriptor, and a call to the BPV should be represented 
by a call to that address. 

It is necessary for the back end 12 to know what the parameters are for each entry point in a routine. The front 
end 20 accomplishes this by setting the param list and param list tail fiekis of each entry symbol node to point to the 
first and last nodes in a list of parameter nodes (linked by their next fiekJs) that represents the parameter list of that 
entry point. 

Each parameter node has a symbol field which points to a parameter symbol node of the routine that contains the 
entry point, and arg location, pass by register, and special register fields which have the same meaning that they do 
in argument tuples (see Table 20). Thus, the list of parameter nodes klentifies all the parameters of an entry point and 
where they occur in that entry point's argument list. 

Note that a parameter symbol may occur in more than one parameter list, possibly with a different arg location in 
each. Parameter nodes do not have a mechanism fieW, however, since the mechanism Is regarded as an attribute of 
a parameter symbol rather than of its occurrence in a particular argument list. 

The RETURNREG tuple retums a scalar value in a specified register, and the RETURNSTK and RETURNDYN 
tuples retum a string value using one of the dynamic string return mechanisms provided in the PRISM calling standard. 
Note that no special tuples are needed^for a called routine to retum a value through an argument temporary, since 
there is no difference between returning a value through an argument temporary and storing a value into an ordinary 
output parameter. 
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The address associated with a parameter symbol is the address of the parameter's local storage location. The 
called routine can obtain the address of the descriptor for a parameter with the genera! descriptor mechanism by using 
the DESCADDR tuple. It can obtain the actual size of an unknown size parameter using the SIZE tuple, provided that 
the size is available in the argument list (either In a descriptor or in a separate size parameter). 
s All of the operators invohfed in routine calls are listed in Table 21 . 

Storage Allocation and Scoping: 

A lexical block is a range of a source program over whk:h a set of declarations is valid - for example, a routine, 
10 subroutine, function, or begin-end bk)ck. In the symbol table, the lexical structure of a routine is represented by a tree 
of scope block nodes whose root is the routine bk)Ck node. Each basic bkxk in the ILG contains code bebngng to a 
single lexk:al block. The branch target tuple at the start of a bask: bkxk has a scope block field which points to the 
corresponding bk)ck node in the symbol table. Every lexk:al bkx;k in a routine must have a unique scope entry basic 
block. whk:h is the only basic block in the lexk^l block to which control can pass from any basic bkx:k outskJe that 
IS \ex\ca\ bkx:k. This scope entry basic bkx:k is identified by the bkx:k entry flag in the branch target tuple. 

A reference to a variable symbol in the OIL always yields the address of a storage location (or the name of a 
register): 

1 . A statk: variable is one whose storage class is static. gk>bal ref , or preserved. Statb variables are located in 
20 some PSECT at compile time, so that every reference to such a variable will refer to the same tocation. 

2. A local variable is one whose storage class is automatic, stacklocal, register, or register preferred, and whose 
unknown size flag is false. Local variables exist only during a single executk>n of their lexical scope, and may have 
multiple instances if multiple Instances of their lexical scope may be executing simultaneously. They are alkx»ted 
at compile time to registers or to known locatkms in their routine's stack frame. 

2S 3. A dynamic variable is one with the same storage class as a kx^al variable, but whose unknown size flag is taie. 

Like \oca\ variables, dynamb variables exist only during a single execution of their lexical scope, and may have 
multiple instances if multiple instances of their lexical scope may be executing simultaneously. They are allocated 
on the stack at run time by a CREATE tuple, and are accessed through an associated pointer variable whk^ is 
created by the back end. 

30 4. Parameters with copy semanttes behave like kx:al or dynamic variables, depending on the setting of their un- 
known size flag. 

5. Parameters with bind semantk:s are not allocated in the called routine at all. They are accessed through an 
associated pointer variable whk:h is created by the back end to hold the actual storage kx^ation address. 

35 A tuple in a lexical block may refer to any variable which is declared in that lexical block, or in any of its ancestors 

in the symbol table bkx;k tree. There are no problems referring to variables in the current routine, of course. Statk; 
variables of another routine can be referred to directly. Local and dynamic variables of other routines require a "static 
chain" to locate the stack frame in whk:h the variable is declared. However, the back end 12 is completely responsible 
for generating the code for creating and using static chains, provkJed that the front end correctly annotates the routine 

^ blocks and variables. 

There are several kinds of dynamic stack allocation: 

1 . The stack storage for a dynamic variable is alkx:ated by a CREATE tuple, it exists from the execution of the 
CREATE tuple until control passes into a basic bkxk whbh is not in the same lexical block as the CREATE tuple. 

^ (This means that the CREATE tuple for a dynamic variable must be allocated in a basic block whose scope block 

is the block in which the variable is declared; otherwise, its dynamic storage will be released while the variable is 
lexically still in scope.) 

2. Code to allocate the stack storage for an unknown size copy parameter is generated immediately following the 
ENTRY tuple. Since ENTRY tuples must be in the main routine bkx:k, this storage exists until the routine retums. 

so 3. A dynamic temporary may be created t^y the back end to hoki the value of an aggregate expression. It exists 
from the execution of the tuple whk:h creates the value at least until the execution of the tuple which uses that vale. 

4. Stack space is allocated to hold the argument value for an aggregate ARQ VALx tuple, it exists from the execution 
of the ARGVALx tuple until the execution of the CALL tuple. 

5. Stack space is alk>cated to hoM a return value for an ARGTMPx tuple. It exists from the execuiton of the AR- 
ss GTMPx tuple until the evaluatkxn of the RESULTx tuple whk;h fetches the retum value. 

While this invention has been described with reference to specific embodiments, this description is not meant to 
be construed in a limiting sense. N^rk^us modificatbns of the disclosed embodiments, as well as other embodiments 
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of the invention, will be apparent to persons skilled in the art upon reference to this description. It is therefore contem- 
plated that the upended claims will cover any such modfflcations or embodiments as fall within the scope of the claims. 

TABLE 1 

PRERXING COhSVENTIONS FOR GLOBAL AND EXPORTED S4AMES 
Names esiported from packages 
o Routine names have the form GEM$ZZ_name. 
^Exported macro names have the form GEM$Z2_name, 
o Global variable names have the form GEM$ZZ_name. 
«> Literal names (whether global or exported) have the form GEM$ZZ.K_name. 
Enumerated data types 

oEvery enumerated data type has a unique "type name." 

o Each literal in the type XYZ has a name of the fomi GEM$XYZ.K_name. 

• The names GEM$XYZ__K_._FIRST and GEM$XYZ_K_ _LAST refer to the first and last values in the range of the 
type. 

Aggregate data types 

o Every aggregate data type has a unique "type name". 

• Each field in the aggregate type XYZ has a name of the fomri GEM$XYZ_name. 

«»Si2es of particular variants of an aggregate type are literals with names of the forms GEM$XYZ_name ^SIZE. 

o The size of an aggregate type as a whole (i.e.. the size of its largest variant) is GEM$XYZ_ _SIZE. 

«> The name GEMSXYZ refers to the type declaration macro, whose expansion lsBLOCK[GEM$XYZ__SIZE, BYTE] 

FIELDS(GEM$XYZ_ .FIELDS). 



TABLE 2 



DATA TYPES OF SHELL ROUTINE ARGUMENTS 
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Integer 

String 

Handle 

Block 

Counted 
vector 



32-bit (longword) signed integer Passed by value. 

A varying string (16-bit unsigned length word + text). Passed by reference. 

A 32-brt (kxigword) value which is interpreted by the shell routines (often as the address of a 

shell internal data structure), but which has no meaning to the front end. Passed by value. 

Some data block whose structure is defined in the shell package specifications, and whose 

contents are used to communicate between the front end and the shell. Passed by reference. 

A 32-btt unsigned count word, folk>wed by the specified number of 32-bit components. The 

components of a vector may be Integers, addresses of varying strings, handles, or addresses of 

bbcks. Passed by reference. 
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TABLE 3 

GEftfl$XXJNIT 

is called by the shell 11 almost as its first action. (The only things the shell does before calling GEM$XXJNIT are 
to start the timing inten«l GEM$TM_GJCB_CMPTTL (see <REFERENCE>(sect_shelLtm)), initialize the 
debugging package (see <REFERENCE>(sect_shelLdb)), and initialize the global variable 
GEM$CP_G_ERROR_FCB to the output file handle of the "standard en^or" file. 

On retum from GEM$XXJNIX all the GEM$XX global variables listed below must be properly initialized. Other front 
end initialization may also be done in GEM$XX JNIT or it may be postponed until GEM$XX_PROCESS_GLOBALS 
(see bebw). 

Since the shell 11 does not do any command line processing until after calling GEM$XXJNIT, it is possible under 
VAXA/MS to Implement a GEM compiler with a foreign command instead of a DCL command by having 
GEM$XXJNIT call LIB$GET_FOREIGNto read the command llneandCLI$DCL_PARSE to set theconri^ 
that the shell will process. 



ss 



GEM$XX.PROCESS.GLOBALS 
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TABLES (continued) 

is called by the shell after it has processed the global qualifiers from the command line, but before it has processed 
any command-line parameters or local qualifiers. Tbis routine can examine the global qualifier blocks and take 
^ whatever action is appropriate. 

GERfl$XX.PROCESS_LOCALS 

is called by the shell 11 after it has processed the local qualifiers from the command line, but before it has opened 
f 0 any files 21 specified by them. This routine can examine the local qualifier bkxks and change their contents as 
desired. This albws for dependencies between qualifiers that cannot be represented in the individual qualifier bkxks. 

GEM$XX.COMPILE 

*5 is called by the shell 1 1 after it has parsed a parameter plus-list and its qualifiers, filled in the local qualifier blocks, 
and initialized GEM$TI with the input stream specified by the plus list. This routine is responsible for compiling that 
input stream. 



GEM§)OLFINI 

is called by the shell as its very last action before it exits. This routine may do any f ront-end-specific clean-up. 
The front end must also declare the following gk>bal variables. They must be defined by the time that GEM$XXJNIT 
25 returns control to the shell 1 1 . (They may be defined at link time, but this will require address fixups at image actlvatbn 
time.) 

GEM$XX_G_GLOBAL_QUALS 

contains the address of a counted vector of pointers to the qualifier blocks for the compiler's gbbal qualifiers (see 
30 <REFERENCE>(sect_shelLcp)). These gbbal qualifier blocks will be filled in by the shell before it calls 
GEM$XX_PFW3CESS_GLOBALS. 

GEfll9$X3CG_LOCAL_QUALS 

3S 

contains the address of a counted vector of pointers to the qualifier blocks for the compiler's kx:al qualifiers (see 
<REFERENCE>(sect_shelLcp)). 

GEM$XX_G FAC_PRERX 

40 

These local qualifier bkx:ks will be filled in by the shell before each call to GEM$X)(_COMPILE. 

contains the address of a varying string containing the facility string to be used in constructing compiler messages. 

GEM$XX_G_FAC^NUMBER 

45 

contains the integer facility code to be used in constructing compiler message codes. 
GEIM$XX_GJN.DEFAULTS 

50 

contains the address of a counted vector of pointers to varying strings containing the default file specifications to 
be used when opening source files specified in the command line parameters. 

GEM$XX_G_LIB_DEFAULTS 

55 

contains the address of a counted vector of pointers to varying strings containing the default file specificatbns to 
be used when opening text libraries specified as command line parameters with the /LIBRARY qualifier. 
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GEM$XX.G_PRODUCTJD 

5 

contains the address of a varying string containing the product Identification string to be used in header lines in the 
listing file. 

GEWI$XX_G_PRERX^LEN 

10 

contains an integer specifying the number of columns to be resented for a prefix string (specified by the front end) 
to be attached to source lines in the listing file. 

The Virtual Memory Package (GERfl$VM) 

IS 

The virtual memory package provides a standard interface for allocating virtual nrtemory. It supports the zoned 
memory concept of the VMS UB$VM facility; in fact, under VMS, GEM$VM is an almost transparent layer over 
LIB$VM. However, the GEM$VM interface is guaranteed to be supported unchanged on any host system. 

20 

The Locator Package (GEM$LO) 

A locator describes a range of source text 15 (starting and ending file, line, and column number). The text input 
package returns kx:ators for the source lines that it reads. Locators are also used in the symbol table 16 and 
2s intermediate language nodes to facilitate message and debugger table generation, and are used for specifying 
where in the listing file the listing package should perform actk)ns. A locator is represented as a longword. The 
locator package maintains a locator database, and provkJes routines to create and interpret locators. 
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.TABLE 4 



INTERIUEDIATE LANGUAGE DERNITION RLES 



GEM$ND_NODES.SDL 

GEM_CONSTAfsJTS.DAT 

GEM^CONSTANTS.SDL 

BLK_NODE.SDL 

SYM_NODE.SDL 

FRM_NODE.SDL 

LIT_NODE.SDL 

PRM_NODE.SDL 

TPL_NODE,SDL 

DES_NODE.SDL 

GEM$ND.L32 



Contains several general type definitions, and includes all the SDL files listed bek>w. 
It defines the generic GEM$NODE aggregate type. 

Contains the definitions of the node kind and node subkind enumerated types, as well 
as a variety of other enumerated types. 

The SDL transtatbn of GEM.CONSTANTS.DAT See Appendix D for a descriptkm of 
the CONSTANTS program whbh does the translation. 

Contains the definition of block nodes (GEM$BLOCK_NODE), identified by a value of 
GEM$NODE_K_BLOCK in the node's kind field. 

Contains the definitkxi of symbol nodes (GEM$SYMBOL_NODE). identified by a value 
of GEM$NODE_K_SYMBOL in the node's kind fieM. 

Contains the definition of frame nodes (GEM$FRAME_NODE), identified by a value of 

GEM$NODE_K_FRAME in the node's kind field. 

Contains the definitbn of literal nodes (GEM$LITERAL_NODE), identified by a value 
of GEM$NODE_K_LITERAL in the node's kind fieki. 

Contains the definition of parameter nodes (GEM$PARAMETER_NODE), kientified by 
a value of GEM$NODE_K_PARAMETER in he node's kind field 
Contains the definitkxi of tuple nodes (GEM$TUPLE_NODE), identified by a value of 
GEM$NODE_K_CIL_TUPLE in the node's kind flekJ. 

Contains the definitbn of destination nodes (GEM$DESTINATION_NODE). identified 
by a value of GEM$NODE_K_DESTI NATION In the node's kind field. 
The library file which should be used by front ends coded in BLISS. It contains the 
BUSS translation of the files listed above. 



39 



EP0S29IMI9B1 



TABLES 



Symbol Table and IL Routines 


Routine 


Purpose 


Initialization and Termination 


GEIWISSTJNIT 
GEIVI$ST_FINI 


Initialize the intermediate representation for a module. 
Release all space that has been allocated for the Intermediate 
representation of a module. 


Creating and Manipulating ILGs 


GEM$IL_ALLOCATE_CIL_NODE 

GEM$IL_ALLOCATE_DES„NODE 

GEM$IL_FREE_DES_NODE 

GEI^ASILJNSERT 

GEM$IL.UNLINK 


Allocate a OIL tuple node. 

Allocate a destination node. 

Deallocate a destination node. 

Insert a tuple or a list of tuples into a list of tuples. 

Remove a tuple from a list of tuples. 


Creating the Symbol Table 


GEi^T_ALLOCATE3LOCK NODE 

GEIW$ST_ALLOCATE_FRAI\/IE_NODE 

GEM$ST_ALLOCATE_MUTABLE_SYMBOL 

GEI^T_ALLOCATE^PAI^AMETER_NODE 

GEM$ST_ALLOCATE^SYIWBOL NODE 

GEi^$ST_LOOKUP_LITERAL 

GEi^T_LOOKUP_PSECT 

GEiW$ST,MUTATE_SYiVIBOL 


Allocate a block node. 

Allocate a storage frame node. 

Allocate a Symbol node whose subicind can be changed. 

Allocate a parameter list node. 

Allocate a symbol node whose subicind cannot be changed. 
Get a literal node for a specified literal value. 
Get a PSECT storage frame node with a specified name. 
Change a subkM of a mutable symbol node. 


Speelfyinq Initial Values 


GEM$ST_STORE_ADDRESS 

GEI^T„STORE_BUFFER 

GEM$ST_STORE_LITERAL 


Specify a symbol or PSECT address as the initial value of a 
variable or PSECT location. 

Specify an arbitrary block of bytes as the initial value of a variable 
or PSECT tocation. 

Specify the value of a literal node as the initial value of a variable 
or PSECT location. 



TABLE 6 
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Representational lypes for Specific Target Architectures 


Type 


IMIPS 


6443it RISC 


VAX 


Supported Arithmetic Types 


INTB 


Yes 


Yes 


Yes 


INT16 


Yes 


Yes 


Yes 


INT32 


Yes 


Yes 


Yes 


INT64 


No 


Yes 


No 


UINTB 


Yes 


Yes 


Yes 


UINT16 


Yes 


Yes 


Yes 


UlNTaS 


Yes 


Yes 


Yes 


UINT64 


No 


Yes 


No 


REALF 


No 


Yes 


Yes 


REALD 


No 


Yes 


Yes 


REALG 


No 


Yes 


Yes 


REAL 


No 


No 


Yes 


REALS 


Yes 


Yes 


No 
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TABLE 6 (continued) 



Representational Types for Specific Target Architectures 


Type 


Mips 


64-bit RISC 


VAX 




Supported Arithmetic types 


REALT 


Yes 


Yes 


No 


KbALU 


No 


Yes 


No 


REALE 


No 


Yes 


No 


CMPLXF 


No 


Yes 


Yes 


CMPLXD 


No 


Yes 


Yes 


CMPLXG 


No 


Yes 


Yes 


CMPLXS 


Yes 


Yes 


No 


CMPLXr 


Yes 


Yes 


No 




TVP® Sizes 


ADDR 


32 


64 


32 




lV|3e Synonyms 


lADDR 


INT32 


INT64 


INT32 


UADDR 


UINT32 


UINT64 


UINT32 


\MAX 


INT32 


INT64 


INT32 


UMAX 


UINT32 


UtNT64 


UINT32 



2S 



TABLE 6a 



iMew Tuple Fields for Induction Variable Detection 


IVJSJNDUCTIVE - 
IV_BASIC - 
iV_LOOP - 


a flag Indicating that TUPLE Is an inductive expression with respect to the loop designated 
by the loop top TUPLE[IV_LOOP]. At the end of the FINDJV algorithm, this tuple is 
inductive only If IV_BASIC Is in the BASICJVS set of the loop designated by 1 V_LOOR 
the basic induction variable candidate of TUPLE. If 1 V_BASIC Is not in the basic induction 
variable set of IV_LOOP after the FIND_TV algorithm has completed, then this tuple is 
not Inductive. 

the bop top of the innemrK>st loop that TUPLE is inductive within. 
Each inductive expression E defines a linear functbn on a basic induction variable 1. That 
Is, E can be recast in tenns of 1 by a function of the form: 
E = (a * 1) + b 

where "a" is the ■coefficient" of the linear function, and "b" is the "offset" The 
IV_COEFFICIENT field is an integer field containing the constant part of the coefficient. 
The IV_NON_CONSTANT field is a flag Indicating that the coefficient has non-constant 
parts. 


New Flow Node Fields 


BASICJVS - 
CONDITIONAL_SET - 


set of basic induction variable candidates for the loop represented by "this" loop top. 
initially, this is the set of all variables modified In the bop. Algorithm FINDJV eliminates 
the variables that doni conform to the rules for basic induction variables. Only valid for 
loop tops. 

set of variables with stores that do not get executed exactly once on each complete trip 
through the bop represented by "this" loop top. Presence in this set does NOT imply that 
the variable is an induction variable. Only valid for loop tops. 
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TABLE? 



Common Tuple Fields 


Field 


Meaning 


Kind 

Generic operator 
Operator type 

Result type 
Operands 

Next tuple 
Locator 
Expr count 


The generic node kind field that occurs in every node. 

The general operation performed by the tuple. This is just another name for the generic subkind 
field that occurs in every node. 

A data type which, in conjunction with the generic operator, determines the specific operatbn 
performed by the tuple. 

The operator type is usually, but not always, the same as the data type of one or more operands 
(particularly the first operand) of the tuple. Note that is is not necessarily the same as the data 
type of the value computed by the tuple. For example, ADD.INT16 adds two INT16 operands 
and produces an INT16 result, but LSS.INT1 6 compares two INT16 operands and produces a 
BOOL result, and STORE.INT16 stores ari INT16 value in a memory location and doesnl have 
a result. 

The type of the value computed by this tuple. For most operators the result type is determined 

by the opeartor type, but for some operators the result type is independent of the operator type, 

and the specific operatkxi performed by the tuple depends on both types. 

An array of pointers to the operands of this tuple. The number of operands is determined by 

the generk: operator. Each operand pointer points to another IL tuple node or, in the CIL only, 

to a symbol or literal node. The individual operand pointer fields may be referred to as opi. op2. 

eta 

Pointers to the next and previous tuples in a doubly-linked list of tuples. The next tuple order 
is the implicit order of evaluation. In the CIL, all the tuples in the ILG are linked together, while 
in the EIL, the tuples in each basic block form a separate list. 

The textual tocation in the program source of the token or tokens which were compiled into this 
tuple. It is used in constructing error messages, source correlation tables, etc. (Locators are 
described in the GEM$LO package specrficatbn.) 

Used only in EIL tuples, where is is set by the back end. The expr count field is discussed in 
CT029, Interface for Representing Effects. 



TABLES 





Headings in Tuple Dictionary Entries 


Heading 


Description 


40 


Operator 


The name of the operator appears at the top of the dictionary page. This name may be prefixed with 






GEM$TPL_K_ to yiekJ the actual constant used in GEM code. 




Overview 


The tuple overview appears directly below the operator name. It explains in one or two sentences 






what a tuple with this operator will do. 




Format 


The tuple format foltows the tuple oven/iew. It specifies the number of operands that the operator 


45 




takes and the albwable operator types, operand types, and result types. 




AttrOiutes 


Attributes are tuple fields other than the common fields listed in Table 7. The attrbutes section follows 






the f onfnat section, and lists all the attributes that are used in the tuple. The meanings of the attributes 






are generally summarized in the restrictions and description sections. 




Value 


The value section folbws the attributes section. It provkles a detailed description of the value 


SO 




returned by the tuple as a function of its operands. 
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TABLES (continued) 





Headings in Tuple Dictionary Entries 




Heading 


Description 


5 


riesiriciions 


The restrictions section follows the value section. It describes restrictions on the use of the tuple. 
Restrictions generally fall into one of the following categories: 
(a) The tuple can be used only in the CIL or the EIL. 


10 




(b) The tuple must occur in a particular context In an ILG, or must be an operand of a particular 
kind of tuple. 

(c) Certain operands of the tuple must be tuples with specific operators. 

(d) Certain attribute fields of the tuple must contain pointers to particular kinds of nodes. 


IS 
20 


Description 


Only structural (syntactic) restrictfonson theform of the ILG are documented in this section. Runtime 
restrictbns, such as the requirement that the length operand of a substring tuple must not be 
negative, are given in the description section. 

The description section folte>ws the restrictbns section, and describes the effects of the tuple, it also 
gives miscellaneous infonmatnn about the tuple such as runtime requirements on its operand values, 
error conditions that can occur, and particular source language constmcts that the tuple is provkled 
to support. 



TABLE 9 



Data Access Operators 


Operator 


Meaning 




Fetch Operators 


FETCH 


Fetches a representational value. 


FETCHA 


Fetches a signed integer with sign extension or an address or unsigned 




integer with zero extension from a packed array element. 


FETCHF 


Fetches a character or bit string with a specified length. 


FETCHS 


Fetches a character or bit substring, that is, a string with a specified 




length and specified character or bit offset from a base address. 


FETCHV 


Fetches a varying length character string, that is, one whose length is in 




the word preceding the text of the string. . 


FETCHX 


Fetches a signed integer with sign extension or an address or unsigned 




integer with zero extension from a bit field. 


FETCHZ 


Fetches a null-terminating character string. 


FETCHZA 


Fetches a signed integer with zero extension from a packed array 




element 


FETCHZX 


Fetches a signed integer with zero extension from a bit field. 




Store Operators 


STORE 


Stores a representational value. 


STOREA 


Stores an integer or address value in a packed array element 


STOREF 


Stores a character or bit string. 


STORES 


Stores a character or bit substring, that is, stores a string with a specified 




length at a specified character or bit offset from a base address. 


STOREV 


Stores a varying length character string, that is, stores the text of the 




string following a word containing the length of the string. 


STOREX 


Stores an integer or address value in a bit field. 


STOREZ 


Stores a nu[l>terminated character string, that is, stores the text of the 




string followed by a null character (all zero bits). 



43 



EP0529049B1 



TABLE 9 (continued) 



Data Access Operators 


Operator 


RAeaning 




Store Operators 


VSTORE 

VSTOREA 

VSTOREX 


Stores an arithmetic or address value, and yields the value that was 
stored. 

Stores an integer or address value in a packed anay element, and yields 
the value that was stored. 

Stores an integer or address value in a bit field, and yields the value that 
was stored. 




Increment Operators 


POSTINCR POSTINCRA POSTINCRX 
PREINCR PREINCRA PREINCRX 


Fetches a representational value from a variable, from a packed array 
element, or from a bit field, adds a compile-time constant increment to 
it, stores the result back into memory, and yields the initial 
(unincremented) value. 

Fetches a representatkxial value from a variable, from a packed array 
element, or from a bit field, adds a compile-time constant increment to 
it, stores the result back into memory, and yields the incremented value. 




Variable R/lodlflcatlon Operators 


ADDMOD ADDMODA ADDMOOX 
DIVMOD DIVMODA DIVMODX 
UVNDMOD lANDMODA lANDMODX 
lORMOD lORMODA (ORMODX 
IXORMOD IXORMODA IXORMODX 
MULMOO MULMODA MULMODX 
REMMOD REMMODA REMMODX 

SHLMOD SHRMODA SHRMODX 
SUBMOD SUBMODA SUBMODX 


These operators fetch a value from a variable, a packed array element, 
or a bit field, perform an arithmetb operation between the fetched value 
and another operarKi value, store the result of the arithmetic operatbn 
back into the original memory location, and yield the updated value. 
Adds some value to the arithmetic value in a memory bcation. 
Divides the arithmetk: value in a merrtory kx:ation by some value. 
"And's the integer value in a memory location with some value. 
°Or"s the integer value in a memory location with some value. 
'Exclusive or"s the Integer value in a nrtemory location with some value. 
Multiplies the arithmetic value in a memory bcation by some value. 
Takes the remainder of the arithmetic value in a memory location with 
respect to some value. 

Shifts the integer value in a memory bcation right by some value. 
Subtracts some value from the arithmetic value in a memory bcatbn. 



40 



TABLE 10 



Attributes of Data Access TUples 


Attribute 


Meaning 


Offset 
Effects 

Effects2 

Base symbol 
Must read 


A constant offset (in bytes to be added to the address operand for the fetch or store operation. 
A longword which is reserved for use by the front end. GEM will never examine this field (except 
when propagating it during IL expansion). It is intended as a place for the front end to save 
information about the memory tacations affected or accessed by the tuple. See CT.029 for more 
details. 

Not used in FETCH and STORE tuples. For a PREINCR, POSTINCR, or opMOD tuple, effects 
pertains to the "read effects" (dependencies) of the tuple while eff ects2 pertains to its "write effects. " 
Base symbols are described in CT070, Data Access Model. 

Not used in STORE tuples. Indicates to the optimizer that the variable being fetched may have been 
written, through some mechanism not othenvise detectable in the 1 L, subsequent to any prior fetches 
or stores, and that it therefore must not be assumed to have the same value that it had at the time 
of any prior fetch or store. IL expansion will automatically set the must read flag of a fetch whose 

base symbol has the has volatile writes attribute. 
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TABLE 10 (continued) 



Attributes of Data Access Tuples 


Attribute 


Meaning 


Must write 


Not used in FETCH tuples. Indicates to the optimizer that the variable being written may be read, 
through some mechanism not otherwise detectable in the IL, prior to any subsequent fetches, and 
that this store must therefore be performed, even if nofetchesaredetectableprlortoany subsequent 
stores. IL expansion will automatically set the must write flag of a store whose base symbol has 
the has volatile reads attribute. 



TABLE 11 





Arithmetic Operators 


15 


Operator 


Rfleaning 






Fetch Operators 




FETCH 


Fetches a representational value. 




FETCHA 


Fetches a signed integer with sign extension or an address or unsigned 


20 




integer with zero extension from a packed array element. 




crcTouv 
rt 1 UnA 


Fetches a signed integer with sign extension or an address or unsigned 






integer with zero extension from a bit field. 




rc 1 Lrri£J\ rt 1 OMiLA 


Fetches a signed integer with zero extension from a packed array 






element. Fetches a signed Integer with zero extension from a bit fiekJ. 






Store Operators 




STORE 


Stores a representational value. 




STOREA 


Stores an integer or address value In a packed array element 


30 


STOREX 


Stores an integer or address value in a bit field. 


VSTORE 


Stores an arithmetk: or address value, and yields the value that was 






stored. 




VSTOREA 


Stores an Integer or address value in a packed array element, and yields 






the value that was stored. 


35 


VSTOREX 


Stores an integer or address value in a bit field, and yields the value that 






was stored. 






Arithmetic Computations 




ABS 


Computes the absolute value of Its operand. 


40 


ADD 


Computes the sum of its operands. 




CADD 


Computes the sum of a complex and a real operand. 




CDIV 


Computes the quotient of a complex and a real operand. 




CEIL 


Computes the smallest integer value which Is not less than the value of 


45 




Its real operand. 


CMUL 


Computes the product of a complex and a real operand. 




CONJG 


Computes the complex conjugate of its operand. 




CREVSUB 


Computes the difference of a complex and a real operand. 




CSUB 


Computes the difference of a complex and a real operarKl. 


SO 


DIV 


Computes the quotient of Its two operands. 




FLOOR 


Computes the largest integer value which is not greater than the value 






of Its real operand. 




IPWR 


Computes Its first operand raised to the power of its Integer second 






operand, signalling an error if both operands are zero. 


55 


IPWRO 


Computes its first operand raised to the power of its integer second 






operand, yieMing one if both operands are zero. 
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TABLE 11 (continued) 



Arithmetic Operators 


Operator 


RAeaning 




Arithmetie Computations 


IPWRZ 


Computes rts first operand raised to the power of its integer second 




operand, yielding zero if both operands are zero. 


MAX 


Computes the maximum of its operands. 


MIN 


Computes the minimum of its operands. 


MOO 


Computes the nnathematical modulus of its operands (The Ada and PU 




1 MOD operators). 


MUL 


Computes the product of its operands. 


NEG 


Comoutes the neoative or twos-^^rimnlAmAnt nf itQ onoranH 


PMOD 


1 ipuicTo uio 1 1 Kill iCfi 1 idimcii iiRjuuiuo \ji llo opoianus, wnere ins uivisor 






PWR 


oompuies ns nrsi operanu raisea lo ine power OT ns secono c^erano, 




sionailino an error if both onerrinHQ arA 7Am 


PWRO 






yielding one if bonh operands are zero. 


PWRZ 


Computes its first operand raised to the power of its second operand. 




yielding zero if both operands are zero. 


IlLllVI 


Computes the remainder of its operands (the FORTRAN MOD function, 




DLioo twsju operator, u y© operator, ana Kascai ana Ada ricM 






ROUND 


Roiinrie thA ffA^tifVlsl nnrf nf si roal niimhAr fhA rtAorAet intA/vAr tfoliiA 

1 iwfo iiciuiiv^icii pen 1 1^1 d ical iiuiiiuor lo iiio noaiosi inisger vsiue. 


SUB 


rVvnniltAQ thA rliffAronf>a rvf itc rknAronrlc 
v.'ib./ii ipuiBO lilt? Ulllciclluc \ji llo OpclailUo. 


TRUNC 


1 iu» loditf^ 11 lo iiduimncii pan %ji a real numuer lowaros zero 




Slilftlngand fWasldng 


lAND 


Computes the bitwise conjunction of two integers. 


lEQV 


Computes the bitwise equivalence of two integers. 


ilMLli 


Computes the bitwise complement of an integer. 


lOR 


l^fVnm it AQ thA hitu/ico fHichinotinn r\1 tu/n infAnAre 
v^ImRI ipuico Ulc Ulivvioc UloJUIIolKJIl \jI IWU iill&yclo. 


IXOR 


CcXTlDUtfiS thB hrtwtfiA Aynlii(&h/A nr nf twn intoriAre 


ROT 


■ l^^iCllwO CUI UHO^d VCIiUt?. 


SHL 


Shifts an inteoer vfilus lAft hv a nnsitii/A (shift iviiint 


SHR 


Shifts an inteoer valuQ rinht hv p no<^iti\/A ^hrft r^niint 


SH 


Shifts an inteoer value left or rinht dpnpndinn nn thf» «5inn nf P4 <ihrft rvMint 




Mathematical Computations 


ACOS 


Computes the arc cosine in radians of its operand. 


ACOSD 


Computes the arc cosine in degrees of its operand. 


ASIN 


Computes the arc sine in radians of its operand. 


ASIND 


Computes the arc sine in degrees of its operand. 


ATAN 


Computes the arc tangent in radians of its operand. 


ATAND 


Computes the arc tangent In degrees of its operand. 


ATAN2 


Computes the arc tangent in radians of the ratio of its two operands. 


ATAND 


Computes the arc tangent in degrees of the ratb of its two operands. 


COS 


Computes the cosine of its operand, which is specified in radians. 


COSD 


Computes the cosine of its operand, which is specified in degrees. 


COSH 


Computes the hyperbolic cosine of its operand. 


EXP 


Computes the exponential (e to the power) of its operand. 


LOG 


Computes the base-e logarithm of Its operand. 
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TABLE 11 (continued) 





Arithmetic Operators 






RAeaning 


5 




Mathematical Computatione 




LOG2 


Computes the base-2 logarithm of its operand. 




LOG10 


Computes the base-10 logarithm of its operand. 






Computes the sine of its operand, which is specified radians. 


10 


SIND 


v^^iipuioo u\xs oil IB ui iio upeicina, wriicn is specuieo m oeorees. 






Computes the hypertx>lic sine of Its operand. 




SQRT 


uompuies me squire root of its operand. 




\t\vi 


Computes the tangent of its operand, which is specified in radians. 


IS 


1 ANU 


Computes the tangent of its operand, which is specified in degrees. 


TAKIM 
IMINn 


Computes the hyperbolic tangent of Its operand. 






Conversions 




CAST 


Yields the value of an nrithmi^tip tvnA wHinh hac tho eama Kit nattAm oc 


20 




some value of some other type. 




CMPLX 


Con*«tnjrtQ IWnnlpy ni imhor f mm tuun roal /^nAronHe 




cvr 


Translates a value of one arithmetic tvoa into a valnA of annthnr 






arithmetic type. 




tMAG 


Takes the imaginary part of a complex number. 


2S 


REAL 


Takes the real part of an imaginary number. 




ROUND 


Converts a real number to an integer value by rounding the fractional 






part. 




1 MUNU 


Converts a real number to an integer value by truncating the fractional 






part toward zero. 


30 




Converts a value of one integer type to another integer type, discarding 






6XCfiSS Sinnifif^nt hitQ in Hia rAOrAQontatirkn r^f tha rv^nuArtArl t#olttA 

oM^^ooo oliji iiiiirfOiii uiio III II lo lopioooi iiciiiijii VII iiio uonvoiieu Value. 










EQL 


lesis IT one ariinmeiic value is equal to another. 


35 


GEQ 


Tests if one arithmetic value is greater than or equal to another. 




GTR 


Tests if one arithmetic value is greater than another. 




LSS 


lesis IT one arimmeiic value is less than another. 




LEQ 


lesis H one aninmeiK; vaiue is less man or equal to another. 


40 


NEQ 


ioolo II oritj diiinmBiiu Value is uiirereni irom anomer. 






Variable ROodlflcatlon Operators 




ADDMOD ADDMODA ADDMODX 


Adds some value to the arithmetc value In a menrK>ry kx»tkxi. 




Dl VMOD DIVMODA DIVMODX 


Divides the arithmetic value in a memory kx^tion by some value. 


45 


lANDMOD lANDMODA lANDMODX 


"And's the integer value in a memory location with some value. 




lORMOD lORMODA lORMODX 


"Or"s the integer vaiue in a memory location with some value. 




IXORMOD IXORMODA IXORMODX 


"Exclusive or's the integer value in a memory locatbn with some value. 




MULMOD MULMODA MULMODX 


Multiplies the arithmetic value in a memory k>cation by some value. 




REMMOD REMMODA REMMODX 


Takes the remainder of the arithmetb value In a memory locatk>n with 


50 




respect to some value. 




SHLMOD SHLMODA SHLMODX 


Shifts the integer value in a memory kx:atlon left by some value. 




SHRMOD SHRMODA SHRMODX 


Shifts the integer value in a memory tocation right by some value. 




SUBMOD SUBMODA SUBMODX 


Subtracts some value from the arithmetic value in a memory locatkjn. 



55 
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TABLE 11 (continued) 



Arithmetic Operators 


Operator 


Meaning 


Increment Operators 


POSTINCR POSTINCRA POSTINCRX 
PREINCR PREINCRA PREINCRX 


Fetches a representational value from a variable, from a packed array 
element, or from a bit field, adds a compiie-time constant increment to 
it. stores the result back Into memory, and yields the initial 
(unincremented) value. 

Fetches a representational value from a variable, from a packed array 
element, or from a bit field, adds a compile-time constant Increment to 
it, stores the result back into memory, and yields the Incremented value. 



IS 

TABLE 12 





Character and Bit String Operators 


20 


Operator 


Meaning 




Fetch Operators 


2S 


FETCHF 
FETCHS 

FETCHV 

FETCHZ 


Fetches a character or bit strip with a specified length. 

Fetches a character or bit substring, that is, a string with a specified length and specified character 
or bit offset from a base address. 

Fetches a varying length character string, that Is, one whose length Is In the work preceding the 
text of the string. 

Fetches a null-terminated character string. 






Store Operators 


30 

as 


STOREF 
STORES 

STOREV 

STOREZ 


Stores a character or bit string. 

Stores a character or bit substring,, that is, stores a string with a specified length at a specified 
character or bit offset from a base address. 

Stores a varying length character string, that Is, stores the text if the string folk>wlng a word 
containing the length of the string. 

Stores a null-terminated character string, that is, stores the text of the string followed by a null 
character (all zero bits). 






String Manipulations 


40 
45 


CONCAT 
FILL 

REPLICATE 

SUBSTR 

TRANSLATE 


Computes a string consisting of all the elements of one string followed by all the elements of another 
string. 

Creates a copy of a character string, padded to a specified length with copies of a specified 
character. 

Creates the string which is the concatenation of a specified number of copies of another string. 
Extracts a substring from a specified string with a specified starting position and length. 
Creates a copy of one character string, using another character string as a translation table. 






Bit String Logical Operators 


SO 


BAND 

BDIFF 

BEQV 

BNOT 

BOR 

BXOR 


Computes the bitwise conjunction ("set intersection") of two bit strings. 
Computes the bitwise difference ("set subtraction") of two bit strings. 
Computes the bitwise equivalence of two bit strings. 
Computes the bitwise negation ("set complement") of a bit string. 
Computes the bitwise disjunction ("set union") of two bit strings. 
Computes the bitwise exclusive or ("set difference") of two bit strings. 



f 
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TABLE 12 (continued) 





Character and 


i Bft String Operators 




Operator 


Meaning 


5 




Convarsions 


10 


ELEMENT 

SCAST 
USTRING 


Extracts a single element from a character or bit string and yields it as a CHAR or as an IMAX zero 
or one. 

Yields the string with the same bit pattern as some other value. 
Creates a string consisting of a single character. 






Position and Size Functions 


IS 
20 
25 


INDEX 

LENGTH 

PINDEX 

PSEARCH 

PVERIFY 

SEARCH 

VERIFY 


Computes the location of the first occurrence of one character string within another 
Computes the length of a string. 

Computes the location of the first occurrence of one string with in another, but yields -1 if both strings 
are empty. 

Computes the location of the first character In one character string that is also fourtd In another 
character string, but yields -1 if both strings are empty. 

ComDUtes the location of the first chnrRCtpr in nriA r^amctpr <;trinn thAt nnt aloofminrl in annthAr 

character string, but yields -1 rf both strings are empty. 

Computes the location of the first character in one character string that is also found in another 
character string. 

Computes the location of the first character in one character string that is not also found in another 
character string. 






Unpadded Comparisons 


30 


EQL 
GEO 
GTR 
LEQ 
LSS 
NEQ 


Tests if one string is equal to another 

Tests if one string is greater than or equal to another 

Tests if one string is greater than another 

Tests if one string is less than or equal to another 

Tests if one string is less than another. 

Tests if one string is different from another. 


3S 




Padded Comparisons 


Aft 


EQLP 
GEQP 
GTRP 
LEQP 
LSSP 
NEQP 


Tests if one padded string is equal to another 

Tests if one padded string is greater than or equal to another. 

Tests if one padded string is greater than or equal to another. 

Tests if one padded string is less than or equal to another. 

Tests if one padded string is less than another. 

iBolo II mio pctuucu siiiriy is aineieni irom anoiner. 






Set Constructors 


45 


BRANGE 

DolNGLE 
ZEROBITS 


Creates a new bit string by setting a contiguous sequence of bits to one in an existing bit string. 
Creates a new bit string by setting a single bit to one in an existing bit string. 
Creates a bit string of a specified number of zero bits. 






Set Predicates 


SO 


MEMBER 

SUPERSET 

SUBSET 


Tests whether a bit string has a one bit at a specified index. 

Tests whether every one bit In a bit string Is also a one bit in another bit string. 

Tests whether every one bit in a bit string is also a one bit in another bit string. 



55 
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TABLE 13 





Boolean Operators 


5 


Operator 


Meaning 






Predicates 




LBSET 
NONZERO 


Tests whether the least significant bit of an integer value is set 
Tests whether an Integer value is nonzero. 


10 




Representation 




ALLBITS 
LSBIT 


Yields an integer - 1 (or its unsigned equivalent) for true or 0 for false. 
Yields an integer 1 for true or 0 for false. 


IS 
20 
2S 
30 




Relations 


EQL 

EQLBLK 

EQLP 

GEQ 

GEQP 

GTR 

GTRP 

LEQ 

LEQP 

LSS 

LSSP 

MEMBER 

NEQ 

NEQBLK 

NEQP 

SUPERSET 

SUBSET 


Tests is one scalar or string value is equal to another. 

Tests if two blocks of bytes in memory are the same. 

Tests if one padded string is equal to another. 

Tests if one scalar or string value is grater than or equal to another. 

Tests if one padded string Is greater than or equal to another. 

Tests if one scalar or string value is greater than another. 

Tests if one padded string is greater than another. 

Tests if one scalar or string value is less than or equal to another. 

icdid II Knixs pauucij oiiiiiy lo icoo iikxii or squai lo anoiner. 

Tests if one scalar or string value is less than another. 

Tests if one padded string is less than another. 

Tests whether a bit string has a one bit at a specified index. 

Tests if one scalar or string value is different from another. 

Tests if two blocks of bytes in memory are different from one another. 

Tests if one padded string is different from another. 

Tests whether every one bit in a bit string is also a one bit in another bit string. 
Tests whether every one bit in a bit string is also a one bit in another bit string. 


3S 




Logical Functions 


40 
4S 


LAND 
LANDC 

LANDU 

LEQV 
LNOT 
LOR 
LORC 

LORU 

LXOR 


Computes the logical conjunctk)n of two Boolean values. 

Computes the logical conjunction of two Boolean values, "short-circuiting" evaluation of the second 
operand if the first is false. 

Computes the logical conjunction of two Boolean values, guaranteeing that both operands will be 

evaluated. 

Computes the logical equivalence of two Boolean values. 
Computes the logical complement of a Boolean value. 
Computes the logical disjunction of two Boolean values. 

Computes the logical disjunction of two Boolean values, "short-circuiting" evaluation of the second 
operand if the first is true. 

Computes the logksl disjunction of two Boolean values, guaranteeing that both operands will be 
evaluated. 

Computes the logical exclusive or of two Boolean values. 


SO 




Conditionai Expressions 


SS 


SEL 

SELC 

SELU 


Selects one of two values, depending on a Boolean selector 
Evaluates one of two expressions, depending on a Boolean selector. 

Selects one of two values, depending on a Boolean selector, but guarantees that both operands 
will be evaluated. 
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TABLE 13 (continued) 



Boolean Operators 




iviiTai III 1^ 




Operand Delimiter 


FLOWMARK 


Marks the beginning of the tuple sequence for an operand off a LAND. LOR, SEL, LANDC. LORC, 
or SELC tuple. 




Row Control 


ASSERT 
BRCOND 


Signals an exceptbn condition iff a Boolean value is false. 
Branches to one of two destinations depending on a Boolean value. 



TABLE 14 



Checking Operators 


Operator 


Meaning 


ASSERT 

CHKEQL 

CHKGEQ 

CHKGTR 

CHKLENEQL 

CHKLENGTR 

CHKLENLSS 

CHKLEQ 

CHKLSS 

CHKNEQ 

CHKRANGE 

SIGNALS 


Signals an exception if a Boolean value is false. 

Signals an exception iff two values are not equal. 

Signals an exception if one value is less than another. 

Signals an exception If one value is less than or equal to another. 

Signals an exception if the length of a string is not equal to a specified integer. 

Signals an exception if the length of a string is less than or equal to a specified integer 

Signals an exception iff the length of a string is greater than or equal to a specified integer. 

Signals an exception If one value is greater than another. 

Signals an exception if one value is greater than or equal to another. 

Signals an exception if one value Is equal to another. 

Signals an exception if one value does not fall in the inclusive range bounded by two other values. 
Unconditionally signals an exception. 



TABLE 15 





FIOKf Control Operators 




Operator 


Meaning 






Branch Targets 


40 


BEGIN 


Marks the beginning of the ILG for a routine. 




ENTRY 


Represents an entry point of a routine. 




LABEL 


Represents a branch target. 




VLABEL 


Represents a virtual bask; bkxk. 


45 


HANDLER 


TBS. 






Branches 




BRANCH 


Branches unconditionally to a specified destination. 




BRARITH 


Branches to one of three destinations, depending on whether an arithmetb value is neagative, 


SO 




zero, or positive. 




BRCOND 


Branches to one of two destinations, depending on whether a Boolean value is true or fale. 




BRSEL 


Chooses the destinatbn whose low test and high test constants enctose the value of an integer 






selector. 




ENTRYPTR 


Relates a routine's BEGIN tuple to its EI^RY tuples. 


55 


ESTLABEL 


TBS. 




ESTENTRY 


TBS. 




VBRANCH 


Relates a VLABEL to a set off actual possible destinations in a virtual basic bkx:k. 
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TABLE 13 (continued) 



Boolean Operators 


Operator 


Meaning 




Operand Delimiter 


FLOWMARK 


Marks the beginning of the tuple sequence for an operand of a LAND. LOR. SEL, LANDC. LORC, 
or SELC tuple. 




Row Control 


ASSERT 
BRCOND 


Signals an exception condition If a Boolean value is false. 
Branches to one of two destinations depending on a Boolean value. 



TABLE 14 



Checking Operators 


Operator 


Meaning 


ASSERT 

CHKEQL 

CHKGEQ 

CHKGTR 

CHKLENEQL 

CHKLENGTR 

CHKLENLSS 

CHKLEQ 

CHKLSS 

CHKNEQ 

CHKRANGE 

SIGNALS 


Signals an exception If a Boolean value is false. 
. Signals an exception If two values are not equal. 
Signals an exception if one value is less than another. 
Signals an exception If one value is less than or equal to another. 
Signals an exception if the length of a string is not equal to a specified integer. 
Signals an exception If the length of a string Is less than or equal to a specified integer 
Signals an exception rf the length of a string Is greater than or equal to a specified Integer 
Signals an exception if one value is greater than another 
Signals an exception If one value is greater than or equal to another 
Signals an exception if one value Is equal to another 

Signals an exception if one value does not fall In the inclusive range bounded by two other values. 
Unconditionally signals an exception. 



TABLE 15 





Flow Control Operators 




Operator 


{Meaning 






Branch Targets 


40 


BEGIN 


Mari(s the beginning of the ILG for a routine. 




ENTRY 


Represents an entry point of a routine. 




LABEL 


Represents a branch target. 




VLABEL 


Represents a virtual basic blocic 


45 


HANDLER 


TBS. 






Branches 




BRANCH 


Branches unconditionally to a specified destination. 


SO 


BRARITH 


Branches to one of three destinations, depending on whether an arithnnetic value is neagative. 
zero, or positive. 




BRCOND 


Branches to one of two destinations, depending on whether a Boolean value is true or faie. 




BRSEL 


Chooses the destination whose low test and high test constants enclose the value of an integer 
selector 


55 


ENTRYPTR 
ESTLABEL 


Relates a routine's BEGIN tuple to its ENTRY tuples. 
TBS. 




ESTENTIW 


TBS. 




VBRANCH 


Relates a VLABEL to a set of actual possible destinations in a virtual basic block. 
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TABLE 15 (continued) 





Flottf Control Operators 




Operator 


Meaning 


5 




Indirect Branches 


10 


JUMP 

JUMPLOCAL 


Transfers control through a "bound label variable," which may Involve restoring the context of 
an outer routine. 

1 loiioieio i^uiui lu d oopciiieu auuiess, wnicn IS assumeo lo oe ine aooress oi a laoei in trie 
current routine. 






Flovtf Termination 


IS 


JUMPSYMBOL 
RETURN 

STOP 


Does a non-local goto to a specified label symbol in a routine that contains the current routine. 
Terminates the current routine and retums control to the return that called it, immediately 
following the call. 

Terminates the current routine and retums control to the retum that called it. Also informs GEM 
that this routine will never be called again (i.o.. that this retum terminates program execution. 



TABLE 16 





Parameter Symbol Flags That Affect The Choice Between Copy And Bind Semantics 




Flag 


Rdeaning 


25 


Must bind 


Requires that the parameter be implemented with bind semantics. If must bind is specified, 
then the other flags listed below are ignored. 




Conceal alias effects 


Indicates that alias effects must not occur. Basically, this requires that the parameter be 
implemented with copy sennantics. 




Expose alias effects 


Indicates that alias effects must be visible. Basically, this requires that the parameter be 


30 




implemented with bind semantics. 

If neither conceal alias effects nor expose alias effects is specified, then GEM need not 
worry about alias effects. (It will probably use copy semantics for scalar parameters and 
bind semantics for aggregate parameters.) It is an error for the front end to set both of these 


3S 




flags. 


input 


Indicates that the calling routine may have initialized the actual storage location prior to the 
call. If copy semantics are used for this routine, then the actual storage location must be 
copied to the local storage area at routine entry 




Output 


If this nag is set, then the calling routine expects the actual storage location to contain the 
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final value of the parameter upon retum from the call. If it is not set, then the calling routine 




expects the actual storage location to be unaffected by the call. 

If the output flag Is false, then the parameter must have copy semantics. If it is true and 
copy semantics are used, then the local storage location must be copied to the actual 
storage location before the routine retums. 



46 



SO 



52 



EP0529049B1 



TABLE 17 



Settings of Parameter Semantic Flags 

For Various Source Languages 

Language Semantics Expose/Conceal 

Alias Effects Input/Output 

BLISS parameters Don't care Input 

C parameters Don't care Input 

Standard FORTRAN parameters Don't care Input/Output 
(Old) VAX FORTRAN parameters Expose Input/Output 
Pascal value parameters Conceal Input 
Pascal VAR parameters Expose Input/Output 
Ada atomic parameters Conceal see Note 
Ada aggregate parameters Don't care see Note 
PL/I parameters Expose Input/Output 



Note: As specified by the IN, OUT, or IN OUT modifiers in the parameter specification 
in the Ada routine declaration. 



TABLE 18 



The GEGIflSMECHANISM Enumerated lype 


Constant 


Meaning 


Value 
Reference 

String 
Array 
General 


The caller passes the value of the argument. The actual storage location is the entry In the parameter 
list 

The caller passes the address of some storage location. The actual storage location Is the storage 
location whose address was passed in the parameter fist. 

Reference parameters have a length parameter field, which may be defined to point to another 
parameter syml)ol in the same routine. This other parameter, which must have data type IMAX and 
the value mechanism, is assumed to receive the actual length of the reference parameter, whose 
unknown size flag will presumably be set. (This combination of a storage location passed by reference 
and an associated length passed by value is sometimes referred to as an "address and length" 
mechanism.) 

The caller passes the address of a data structure containing the address and length of a character or 
bit string (the maximum length, for a varying character string). The storage location associated with 
the parameter symbol is the contents of the base address field in the descriptor data structure. 
The caller passes the address of a data structure describing a character or bit string as a one- 
dimensional array or bit array. The storage location associated with the parameter symbol is the 
contents of the base address field in the descriptor data structure. 

The caller passes the address of a data structure containing the address of some storage location. 
The storage location associated with the parameter symbol is the contents of the base address field 
in the descriptor data structure. 

The front end is responsible for generating code in the caller to fill in all fields of the descriptor data 
structure other than its base address field, and for generating code in the called routine to interpret 
those fields. The called routine gets the address of the descriptor using the DESCADDR tuple. 
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TABLE 19 



Dynamic String Return (V9echanisms 


Mechanism 


E)escription 


Fixed Buffer 
Stack 

{Dynamic string 


The caller allocates a fixed size buffer and passes a descriptor for It. Tine called routine copies 
as much of the aggregate as will fit into the buffer, and then retums the original length of the 
aggregate. The caller can compare the original length to the buffer length to determine whether 
the retum value has been truncated. (This is equivalent to the fixed-size mechanism described 
above, with an extra retum value for the length. 

The caller passes the address of a descriptor. The called routine leaves the aggregate on the 
stack (beyond the call frame of its caller), leaves the stack pointer pointing past the aggregate, 
and fills In the descriptor to specify the address and length of the aggregate. 
The caller passes a descriptor for a heap-altocated string (a dynamic string descriptor). The called 
routine either ovenwrites the string pointed to by the descriptor or deallocates that string, alk)cates 
another one. and updates the descriptor. 



TABLE 20 





Attributes of Argument Tuples 




Attribute 


Meaning 




Pass by register 


Indicates whether the argument is to be passed in a particular register or in the 


2S 




location determined bv the svstem Cfiilina MAnHarrl fnr thA nprtin lUir arrhitor'ti im 

If it is true, the argument shoukJ be passed in the register whose kientifier (from 
the GEN$TS_REG enumerated type) is in the arg locatk)n field. If it is false, then 
arg kx:ation is simply a 1 -origin index among all the non-register arguments of 
this call, and GEM will determine the appropriate "standard" argument kx^tion. 
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(GEM may overrkie the argument kx^ation specified by arg location and pass by 
register if it has both the calling and called routines available to it, so that it can 
do the necessary analysis.) 




Special register 


May be true only if pass by register is also true, in which case it indicates that 
GEM must use the specified register 


35 


Arg location2 Pass by register2 


Relevant only if mechanism is reference, in whbh case these fiejds specify the 
argument k^cation where the argument's length should be passed by value. The 
length will not be passed in arg location2 is 0. 




Parm is read 


A flag which, if true, indicates that GEM shouki assume that the called routine 
might examine the contents of the actual argument locatbn which is passed to 
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it. (This is meaningful only if mechanism is not value.) 




Parm is written 


A flag which, if true, indicates that GEM should assume that the called routine 
might modify the contents of the actual argument kx:ation which is passed to it. 
(This is meaningful only if mechanism is not value.) 


45 


Desc size 


Meaningful only if mechanism is general, in which case it is the size of the 




descriptor that will be alkx^ated to pass the argument. 




Offset 


Used only in the varbus ARGADR tuples, where it specifies the offset of the 
actual argument address from the tuple's address operand. 




Effects 


Used only in the various ARGADR tuples, where it characterizes the 'read' side 
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effects resulting from passing the argument. 


Effects2 


Used only in the varbus ARGADR tuples, where it characterizes the 'write' side 
effects resulting from passing the argument. 




Base Symbol 


Used only in the various ARGADR tuples, where It is a pointer to the symbol node 
for the variable whose address is being passed, if one is known. 
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TABLE 21 





Routine Call, Argument Passing, and VSaiue Return Operators 


5 


Operator 


Meaning 






Call Initialization 




INITCALL 


Marks tlie beginning of the IL for a routine call, and causes allocation of its argument list. 


10 




Passing a Value 


ARGVAL 
ARGVALA 


Passes a representational value as an argument. 

Passes a character or bit string value with a specified length. 






Passing an Address 


IS 


ARGADR 

ARGADRA 

ARGADRS 


Passes the address of a storage location containing a representational value. 

Passes the address of a storage location containing a character or bit string of a specified length. 

Passes a substring of the bit or character string in the storage location at a specified address. 






Ailocating and Passing a Temporary 


20 


ARGTMP 
ARGTMPA 


Allocates space for a scalar value and passes its address. 

Allocates space for a character or bit string of a specified size and passes its address. 






Creating a Dynamic Return Value Descriptor 


25 


ARGBUF 
ARGDYN 
ARGSTK 


Allocates space for a bit or character string of a specified size and passes a descnptor requiring 
that a value be returned in it with the fixed buffer dynamic return mechanism. 
Passes a descriptor requiring that a character or bit string be retumed with the stack dynamic 
retum mechanism. 

Passes a dynamic string descriptor requiring that a bit or character string be retumed with the 

uyiKuiiiu aiiiiiy oioCK uynainic loiuiri mocnanisiii. 
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r*a88ing MrguFTiBnie in a diock 


35 


BLKFIELD 
ARGDEFINES 


/VIUMILCto opdCo lOl d DiwClv Ol a SpBCIlieO olZ6 oHO pclS56S IIS aUOreSS. 

Stores a scalar value into a field of a previously allocated argument block. 

Describes the side effects which are attributable to passing an argument through an argument 

block. 






Filling in a General Elescriptor 




DSCFIELD 


Stores an address or integer value into a fieki of a previously altocated general descriptor. 
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Calling a Routine 


CALL 


Calls the routine at a specified address. 






Retrieving a Return Value 


45 
SO 


RESULTBUF 

RESULTDYN 

RESULTREG 
RESULTSTK 

RESULTTMP 


Retrieves a character or bit string value which has been retumed in the temporary which was 
aimuaicu wnri an «nuiOLir lupio, aiiu wiiusc lenyin rias ueen reiurnBa in a speciiieu regisier. 
Yields a dynamic string descriptor for the character or bit string whkrf) has been retumed in 
response to an ARGDYN tuple. 
Retrieves a scalar result value from a specified register. 

Retrieves a character or bit string value whk:h has been retumed on the stack in response to an 
ARGSTK tuple. 

Retrieves a result value from a temporary which was alkxated with an ARGTMP or ARGTMPA 
tuple. 






Returning a Value From a Routine 


55 


RETURNDYN 
RETURNREG 


Retums a character or bit string value by the whatever dynamic retum mechanism was specified 
in the descriptor pas^ by the caller. 
Retums a scalar value In a specified register. 
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TABLE 21 (continued) 



Routine Call, Argument Passing, and VSalue Return Operators 


Operator 


Meaning 




Returning a Vialue From a Routine 


RETURNSTK 


Returns a character or bit string value by the fixed buffer dynamic return mechanism if the caller 
passed a fixed buffer descriptor, or by the stack dynamic return mechanism if the caller passed 
a stack or dynamic string descriptor. 




Miscellaneous Parameter Access 


DESCADDR 
SIZE 


Yields the address of the descnptor that was allocated to pass a general mechanism parameter. 
Yields the actual size of an unknown size parameter. 



APPENDIX 



Interpreter Control Actions 

The following actkxis control the execution ftow of the acttons interpreter 

ACTIONS (<result-var-list>; <temporary-var-list) marks the beginning of the action sequence of a template. This 
must be the first actkxi in the template since it allocates the operand variables. The contents of both var-lists is a 
comma separated sequence of kientifiers used to name the operand variables during rest of the template. Either of 
these var-lists may be empty if the template does not use either result operands or temporary operands. 

The identifiers in the result-var-llst are the names of the result operands. ILG nodes in voW context having 0 result 
operands while most other expressions have 1 result operBnd. Exceptions include string results which require two or 
three operands (one to address the string body, one for the string length and one to hold the string body) and complex 
results which require two operands (one for the real component and another for the imaginary component). 

DELAY marits the end of the undelayed actbns and the beginning of the delayed actbns. When the DELAY actton 
is interpreted, processing of the current template Is suspended until the corresponding ILG subtree is used as a leaf 
of a parent subtree. When the template of the parent subtree undelays the corresponding leaf. Interpretation will con- 
tinue with the actions folk>wing the DELAY actbn. 

EXIT terminates interpretation of the action sequence. Interpreting an EXIT actkxi causes the result operands to 
be retumed, causes the remaining operand variables and tocal TNs to be released, and causes interpretation to resume 
with the template that UNDELAYed this actkxi sequence. 

END_ACTIONS mart© the end of an action sequence. This is not tmly an action since it is not interpreted. The 
END^ACTIONS operation must be the lexically last component of the actfon sequence. The operation marks the end 
of the scope of the operand identifiers declared in the ACTIONS operation. 

UNDELAY(leaf,opr1,opr2. ..) causes the delayed context actions of the specified pattem "leaf to be processed. 
The result operands of the leaf are copied into operand variables "oprl "oprS", etc. The number of copied operands 
must match the number of result operands in the template of the leaf. 

LABEL(name) causes "name" to label the current positkxi in the actkxi sequence. 

GOTO(name) causes the interpreter to branch and continue processing at the action following the label specified 

by "name". 

TN Allocation And Lifetime Actions 

INCREMENT^LONO increments the Unear Order Number ckx:k variable that is used to detemilne the lifetimes of 
TNs. 

USE(operand) causes the specified operand variable to be referenced. This actbn is used to mark the last place 
in a template where an operand is used and causes lifetimes to be extended appropriately 

ALLOCATE_PERMANENT(operand, size) causes a permanent class TN of "size" bytes to be created and refer- 
enced by the specified "operand" variable. If the "size" parameter is missing then the size of the TN Is determined by 
the result data type of the current template. This action only creates a TN during the CONTEXT pass. See the SAVE JTN 
action for a descriptran of how this TN is accessed during the TNBIND and CODE passes. 

ALLOCATE_DELAYED(operand, size) causes a delayed class TN of "size" bytes to be created and referenced by 
the specified "operand" variable. If the "size" parameter is missing then the size of the TN is determined by the result 
data type of the current template. This action creates a TN during each of the CONTEXT TNBIND and CODE passes. 
This action may not be performed while interpreting the undelayed actions. The lifetime of this TN tenninates when 
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the result using this IN is used. 

ALLOCATE_LOCAL(operand, size) causes a local class TN of "size" bytes to be created and referenced by the 
specified "operand" variable. If the "size" parameter is missing then the size of the TN is detemiined by the result data 
type of the current template. This action creates a TN during each of the CX5NTEXT TNBIND and CODE passes. The 
lifetime of this TN must temrilnate in the same template as its creation. 

FORCE_REGISTER(operand) causes the TN specified in the "operand" variable to be marked as must not be in 
memory. This usually means allocation to a register unless no register is available in which case the TN is not allocated. 

FORCE_MEMORY(operand) causes the TN specified in the "operand" variable to be marked as must not be in a 
register. This usually guarantees allocation to a stack k>cation. 

MUST_ALLCX;ATE(operand) causes the TN specified in the "operand" variable to be marked as must be allocated. 

Note: It is an error to do all three of FORCE_REGISTER. FORCE_MEMORY and MUST-ALLOCATE on the same 
TN as these three conditions are contradictory and cannot be all fulfilled. 

PREFERENCE(operandl,operand2) if "operandi" isalkjcatedtoa register then "operand2" isallocated to the same 
register; othenwse, •operand2" is altocated independently of "operandi". Forcing "operand2" to the same register as 
"operandi" occurs even If "operandi" and "operand2" have conflicting lifetimes. (See the MOVE^VALUE actton for 
"advisory" preferencing as opposed to the "mandatory" preferencing of the PREFERENCE actton). 

INCREMENT_COST(number,operand) Increases the cost of nonalkxatton of the TN specified by "operand" by 
the anrK>unt "number". 

RESERVE_RO(number) cause "number" of consecutive registers to be reserved starting with register 0. 

TEST_MEMORY(operand, label) tests the TN referenced by the specified "operand" variable. If the TN Is in memory 
then the action interpreter branches to the specified "label". During the CONTEXT and TNBIND passes this action 
assumes that unallocated TNs are not in memory unless they have had a FORCE.MEMORY done on them. 

TEST_REGISTER(operand,label) tests the TN referenced by the specified "operand" variable. If the TN is In a 
register then the actkxi interpreter branches to the specified "label". During the CONTEXT and TNBIND passes this 
action assumes that unallocated TNs are in registers unless a FORCE_MEMORY has been done on the TN. 

ILG Load And Save Actfons 

LOAD_UTERAL(node,operand) toads the literal value of the specified "node" matched by the template pattern 
into the specified "operand" variable. It is an error if "node" is not a LITREF 

SAVE_TN(operand,node,field) saves a reference to the permanent class TN specified by the "operand" variable. 
During the CONTEXT pass the TN pointer is saved In component "fieW" of the ILG tuple matched by the specified 
"node" of the template. During the TNBIND and CODE passes this informatton is fetched from the specified "fieW" of 
the specified "node". Every permanent class TN must be saved during the CONTEXT pass in an apptopnate ILG fiekl 
so that the same TN can be located during the TNBIND and CODE passes. Delayed class and local class TNs are 
recreated each pass so they must never be saved. 

SAVE_OPERAND(operand,node,fiekJ_reg,fiBkl_base) saves the location of the specified "operand" variable. The 
infoimatlon is saved In the ILG tuple matched by the specified "node" of the template. A register value is saved in 
component "field reg". Certain register values encode that no allocation occurred or that the operand is alk^cated on 
the stack Instead of a register. If an operand Is altocated to the stack, the stack offset is saved in the component of 
"node" specified by fieki_base". 

SAVE_REGI STER(operand,node,field) saves the register number of the specified "operand" In the specified "fleW" 
of the specified "node" matched by the template pattem. This set of register numbers includes an encoding that no 
register was allocated. An error occurs if the specified operand is altocated to a memory location. 

Code Emitting Actions 

MOVE_VALUE(opr_src,opr_dst) generates the code to move a value from the "opr^src" operand to the "opr^dsf 
operand. No code is generated if opr_src and opr_dst are kienttoal and this action is a hint to the altocator to make 
them identical. 

EMIT(opcode.operand1,operand2....) outputs an object instruction consisting of the specified "opcode" and using 
the specified operand variables as address modes of the Instruction. 

MAKE_ADDRESS_MODE(opr_offset.opr_base,oprJndex,opr_result) makes a new operand in variable 
"opr_resutt". This Is a VAX specific actton that uses "opr_offset" as the offset, "opr^base" as the base register and 
"opr^Oindex' as the index register in order to create a VAX address mode. If the "opr_offset" is missing then zero is 
assumed. If "opr.offset specifies a memay tocation then "opr_base" must be missing. If "Opr_base" specifies a memory 
location then "opr_offset" must specify zero and "oprjndex" must be missing. 

LOAD_CONSTANT(number, operand) makes a new address mode In "operand" representing the specified literal 
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■number". Note that 'number' is the literal value not a node matched by the pattern. Instead use LOAD_LITERAL to 
create an address mode that contains the value of a LITREF ILG node. 

EXAMPLES 

5 

There are several examples here including very simple addition templates and very complicated addressing tem- 
plates. These should give examples of both easy and drfficutt to write templates. 

The result value mode of a template and the set of value nrrades of pattern match leaves use a data type charac- 
teristic of the target architecture. These value modes are an enumeration of the different ways a value may be encoded. 
10 This enumeration names the various ways expression values may be encoded in the virtual machine. 
Examples for the VAX: 

RV (Register >^lue). 

MV (Memory Value without indirection and without indexing). 
IS MVIND (Memory N^lue with Indirection but without indexing). 

MV1 (Memory Value with byte context). 

MV2 (Memory Value with word context). 

MV4 (Memory \^lue with long context). 

MVS (Memory Value with quad context). 
20 MV16 (Merrory V^lue with octa context). 

AM (Address Mode without indirection and without indexing). 

AMINO (Address Mode without indirection but without indexing). 

AMINX1 (Address mode with byte indexing). 

AMINX2 (Address mode with word indexing). 
25 AMINX4 (Address mode with long indexing). 

AMINXB (Address mode with quad indexing). 

AMINX16 (Address mode with octa Indexing). 

PCFLOW (Flow boo! represented by jump to false label or true label). 

STRINGV (String value encoded as a length and a memory address). 
30 VARYV (Varying string value encoded as address of length word). 

VOID (There is no value-used on an operation with only side-effects). 

Simple ADDL3 On A VAX 
35 Result value nrxxie: RV 
Pattern tree: 

40 0: ADDJNT32 L2 

1: LEAF {RV,MV,MVIND.MV4} 
2: LEAF {RV.MV,MVIND.MV4} 

45 Cost: 2 
Actions: 
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Actions(result; Icafl, leaf2); 

! "result" is the result temporary 

! "leafl" is LEAF 1: (the left operand) 

! "Ieaf2" is LEAF 2: (the right operand) 

UndelaydJeafl); 

Undclay(2,leaf2); 

Use(leafl); 

Use0ea£2); 

Incrcment^LON; 

Allocate_Permanent(result); 

Save_.TN(result,0,ILG_TN); 

Emit( ADDL3 Jcaf 1 ,leaf2^esult) ; 

Delay; 

Exit; 

End__Actions; 

Note: the heuristics used in the register alk>cator guarantee a high probability that the result operand will be allocated 
identically to one of operand 1 or operand 2. Such an allocation will result in an ADDL2 instruction instead of ADDL3. 

Simple SUBLSOn AVAX 
Result value mode: RV 
Pattern tree: 

0: SUB,INT32 1,2 

1: LEAF {RVJV4V,MVIND.MV4) 

2: LEAF {RV,MV,NfVIND,MV4| 

Pattern tests: 
none 

Cost: 2 



Actions: 
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Actions(result; leafl, leaf2); 

! "result" is the result temporary 

! "leafl" is LEAF 1: (the left operand) 

! "Ieaf2" is LEAF 2: (the right operand) 

UndelaydJeafl); 

Undelay(2,leaf2); 

Use(leaf2); 

Increment_LON; 

Use(leafl); 

AlJocate_Pennanent(result); 

Save_TN(result,OJLG^TN); 

Emit(SUBL34caf2,leaf I result); 

Delay; 

Exit; 

End_Actions; 

Note: Incrementing the LON after using operand 2 but before using operand 1 increases the probability that the heu- 
ristics of the register allocator will give operand 1 and the result operand the same allocation which will lead to a SUBL2 
instruction instead of SUBL3. 

Byte Indexed Address Mode On A VAX 

This template generates the k(ba8e_reg)[index_reg] address mode to do addition. The template follows the VAX 
FORTRAN conventions in that choosing this template guarantees that registers will be used to hold the two operands. 

Result value nrxxie: AMINX1 

Pattern tree: 



0: 


ADDJNT32 


1,2 


1: 


LrrREFJNT32 




2: 


ADDJNT32 


3.4 


3: 


LEAF IRV) 




4: 


LEAF |RV) 





Pattern tests: 

NO_OVERFLOW(0); 
NO.OVERFLOW(2); 

Cost: 1 

Actions: 
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10 



Aciion(result; index_reg, base^reg, lcaf4, leaB. lit); 

"result" is result address mode lit(base_reg)[index_reg] 
"index_rsg" is the index scratch register 
"basc_rcg" is the base scratch register 
'leaf4" is LEAF 4: (index leaf) 
"leaO" is LEAF3: (base leaf) 
"to" is LTTREF 1: 



IS 



20 



30 



3S 



Delay; 

! Force LEAF 4: into a register 

j 

Undelay(4,leaf4); 

Allocate_Delayed(index_reg); 

Force_Regisier{index_reg); 

Must_Allocate(index_reg); 

Preference(lcaf4,index_reg); 

Save_Register(index_rcg,0,ILG.Index_Reg); 

Move_Valuc(leaf4,index_reg); 

Use(leaf4); 

! Force LEAF 3: into a register 
! 

Undelay(3,leaB); 
Allocate_Delayed(base_reg); 
Force_Regisier(base_reg); 
Must_Allocate(base_reg); 
Preference(leaf3 ,base_reg); 
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Save_Register(base_reg,OJLG_Base_Reg); 
Move_ Vaiue(leaf3 ,base_reg); 
Use(leaf3); 



45 



so 



Generate address mode "Iit(lcaf3)|leaf4]" 



Load^LiteraldJit); 

Made_Address_Mode(lit,base_reg,index_ieg^esult); 
Incremem^LON; 

EXIT; 

End_Actions; 
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Note that the 7 actions forcing a LEAF into a register will probably be a common operation on a VAX. As a result there 
will be a "macro" action that has the effect of combining these 7 actions. 

Using MOVA For Addition On PRISM Revision 0.0 

Result value (node: RV 
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Pattern tree: 



0: ADDJNT64 1,2 
1: LXrREFJNT64 
2: LEAF [RV] 

Pattern tests: 

Lit_14.Bit(1); ! Succeeds if the literal fits in 14 bits 

Cost: 1 

Actions: 



Actions(resuIt; leaf2, reg2, reg^result, lit); 

! ''result" is result temporary 

! "Ieaf2" describes Leaf 2: 

! "reg2" is a scratch register for holding Leaf 2: 

! "reg_result" is a scratch register for computing result 

! "lit" is Literal 1: 

Undeiay(2,leaf2); 

Allocate_Local(reg2) ; 

Force_Regisier(reg2); 

Must_Allocate(reg2); 

Save_Register(reg2,0,ILG_reg_0); 

Move_Value(leaf2/eg2); 

Use(leaf2); 

Use(reg2); 

Allocate__Local(reg_result); 

Force_Register(reg_result); 

Must_Allocate(reg_result); 

Save_Register(reg_result,0,E-G_reg_iemp); 

Use(reg_result); 

Increment_LON: 

AlIocate_Local(result); 

Save_TN(result.O,ILG_TN); 

Load Jiteral(l, lit); 

Emit(MOVA_Move_Formai^it,reg24-eg_re$ult); 

Move_Value(reg_result^esult); 

Delay; 

Exit; 

End_ Actions; 

Note: the heuristics of the register allocator guarantee that Ieaf2 and reg2 have a high probability of getting the 
register. Also, result and reg^resutt will most likely get the same register 
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LoriQ Context Indexing On VAX 

TTiis template generates the k(leaf3)[Ieat61 address mode to do multiplication by 4 followed by additbn. The tem- 
plate follows the VAX PASCAL conventions in that choosing this template does not guarantee that registers will be 
available to hold the two operands. If registers are not available then the address mode is simulated using memory 
temporaries. 

Result value mode: AMINX4 

Pattern tree : 



0: 


ADDJNT32 


1.2 


1: 


LrrREF<INT32 




2 


ADDJNT32 


3.4 


3 


LEAF |RV} 




4 


MUL.INT32 


5.6 


5 


LITJNr32 




6 


LEAF |RV) 





Pattern tests: 



NO_,OVERFLOW(0); 
NO_OVERFLOW(2); 
NO_OVERFLOW(4); 

LrrERAL_4(5); ! Succeeds if literal value is 4 

Cost: 1 
Actions 
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Actions(result; index.reg, base_reg, leaf6. Ieaf3, lit, temp); 

! "result" is the result address mode 

! "index_reg" is the index scratch register 

! "base_reg" is the base scratch register 

! "Ieaf6" is LEAF 6: (index leaf) 

! "Ieaf3" is LEAF3: (base leaf) 

! "lit" is LITREF 1: 

! "temp" is literal #2 (No^Index case) 

! or is (leaO)[index_reg] 

! (Index_Has_Reg_Temp case) 

Delay; 

Load_Literal(14it); 

Undelay(6,leaf6); 

Undelay(3,leaf3); 

Allocate_Delayed(index_reg); 

IncTcment_Cost (3.index_reg); 

Preference(leaf6,index_reg); 

Allocate_Delayed(base_reg); 

Preference(leaf6,base_reg); 

Increment_LON; 

Test_Memory(index_reg,No_Index); 

Move_Value(leaf6,index_reg); i Make sure Index in 

register 

Test_Mcmory(base_reg,No_Base); 

Move_Value(leaf3,base_reg); ! Make sure Base in 

register 

Make_Address_Mode(lit,base_rcg,index_reg,result); 

! Iit5(base2) [indexl] 

Exit; 
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Labcl(No_Index); 

Load_Constant(2,Temp); 

Emit(ASHL,teinp,leaf6,index_reg): 

Emit(ADDL24eaf3.index_reg); 

Emit(ADDL2aii,index_rcg); 

Make_Address_Mode{,index_i:eg„ result); 

Exit; 



! No register index temp 

! ASHL #2 Jcaf 6,index_mem 

! ADDL2 leaG.index_mem 
! ADDL2 #lit,index_mem 
! @index_inem 



Label(No_Base); i No register base temp 

Test_Memory(leaD,Index.Has_Reg_Temp); ! Index is not in temp 
Emit(ADDL34it,lea£3,base^reg); ! ADDL2 #litaeaG,base_mem 

Make_Addrcss_Mode(,base_reg,index_reg, 

result); ! @ basc_mem[index_rcg] 

Exit; 

Label(Index_Has_Reg.Temp); ! No base reg but index in temp 

Make.Addiess_Mode(,leaf3, index_ 
reg,temp); 

Emit(MOVAL,temp,index.ieg); ! MOVAL @leaD[index_reg]. 

index_reg 

Emit(ADDL2ait,index_reg); ! ADDL2 #lit.index_reg 

Make_Address_Mode(,index_reg„ 

result); ! (index_reg) 

Exit; 

End_Actions 



APPENDIX 

Definitton Of Basic Types 

The following routines define basic types that correspond to the representational types defined by the GEM !L. 
GEM_TD_DEF_BASIC_TrPE defines the types nil, address, signed and unsigned integer, float and complex. 
GEM_TD_DEF_CHAR_TYPE allows the definition of characters defined over a number of base types. 

Note that boolean is not considered a basic type. It is suggested that compilers for languages such as Pascal 
define boolean as an enumeration containing the elements true and false. 



TYPE_Node = 

GEM.TD^DEF3ASIC_TYPE( 

DECL.BLK in„out 

GEM_Block_NODE, 

LOCATOR value, 
TYPE^NAME : in VS_STR, 

BASIC^TYPE value) 
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Defines a basic type such as integer or real. DECL.BLK is the block node in which the type is defined. LOCATOR is 
a QEM or foreign locator. LOCATOR may be the null locator. TYPE_NAME Is a varying string describing the type and 
may be null. BASICJTYPE is the type being defined and must be an element of the GEM.TYP enumeration. Specifically 
excluded are the BOOL, BITS, STR8, and STR16 GEM_TYP elements. 



TYPE^NODE = 

GEMLTD_DEF_CHAR_TYPE( 

DECLJLK in^out 

GEM_Block_NODE, 

LOCATOR ; value, 

TYPE^NAME in VS_STR, 

BASICJTYPE value) 



Defines a character as a basic type. For example, a character may be UINT8. UINT8, UINT16, UINT32. etc. DECL_BLK 
is the block node in which the type is defined. LOCATOR is a GEM or foreign locator. LOCATOR may be the null locator. 
20 TYPE_NAME is a varying string describing the typo and may be null. BASIC_TYPE is the type being defined and 
determines the size and representation of the character set. It must be an element of the GEM_TYP enumeration and 
is restricted to the signed and unsigned integers of size 8, 16, and 32 bits. 

Definition Of Character And String Aggregates 

25 

GEM_TD_DEF_STRING and GEM_TD_DEF_BITSTRING define character and bit aggregates of a given base 
type. 



30 



40 



type_node = 

gem_td_def:_string( 

DECL_BLK 

GEM^BLOCK^NODE, 

LOCATOR 

TYPE^NAME 

STRING_TYPE 

CHAR.TYPE 

STRING_LB 

STRINGj^UB 



Ul_OUl 

value, 

in VSJSTR, 

value, 

value, 
in GEM^NODE, 
in GEM_NODE) 



Defines a character string of STRING_TYPE. The elements of the string are characters of the type defined by 
45 CHAR_TYPE and the string has a lower and upper operands STRING_LB and STRING_UB. The string size (number 
of elements) is STRING^UB - STRING_LB + 1 . A character string of unknown size is indicated by a STRl NG_UB value 
less than the STRING_LB value. 

DECL_BLK is the block node in which the type is defined. LOCATOR is a GEM or foreign kx»tor. LOCATOR may be 
the null locator. TYPE_NAME is a varying string describing the type and may be null. STRING_TYPE is the string 
so representation and is defined as being a member of the enumeration GEM_STRING REPR. CHAR_TYPE is a handle 
to the type node created for the string's character type returned by a call to GEM_TD_DEF_CHAR_TYPE. null. 
STRING_UB and STRiNG_LB are the upper and k>wer bounds of the string. 
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10 



TYPE^NODE = 

GEM_TD_DEF3rrSTRING( 
DECLJLK 

GEM_BLOCK^NODE. 

LOCATOR 
TYPE^NAME 
BITSTRING LB 

GEM.LITERAL.NODE, 



in_out 
value, 

in 



in VS.STR, 



IS 



GEM^UTERAL^NODE) 



BITSTRING.UB 



in 



20 



2S 



30 



Defines a bitstring consisting of BITSTRING_UB - BITSTRING_LB + 1 elements. A bitstring of unknown size Is Indicated 
by a BITSTRING_UB value less than ttie BITSTRING_LB value. 

DECL_BLK Is the block node in which the type Is defined. LOCATOR is a GEM or foreign locator LOCATOR may be 
the null locator. TYPE_NAME is a varying string describing the type and may be null. BITSTRING_UB and 
BITSTRING_LB are the upper and lower bounds of the bitstring. 

Definition Of Tvpedefs And Pointers. 

GEM_TD_DEF_TYPEDEF supports the definition of a new name or synonym for an existing type. 
GEM_TD_SET_POINTER_TYPE allows the definition of a typed or untyped pointer GEM_TD_SET_POINTER_TYPE 
sets the type of a previously specified pointer after the type associated with a pointer has its type information specified 
to the GEM type definition senfice. 



35 



40 



TYPE^NODE = 

GEM_TD_DEF_TYPEDEF( 
DECL.BLK 

GEM^LOCK.NODE, 

LOCATOR 
TYPE.NAME 
DEF TYPE 



in_out 
value, 
value) 



in VS.Sni, 



45 



Define a new type name and associate it with the type represented by the type node DEF_TYPE. DECL_BLK is the 
block node in which the type is defined. LOCATOR is a GEM or foreign locator LOCATOR may be the null kxator 
TYPE_N AME is a varying string describing the type and may be null. DEF_TYPE is a type node created for an existing 
type definition. 



TYPE.NODE = 

GEM_TD_DEF_POINTER( 

DECL.BLK in_out 

GEM^BLOCK.NODE, 

LOCATOR value. 
TYPE_NAME in VS^STR, 

POINTER.TYPE value) 



Define a pointer type. POINTER.TYPE may be a type node for an existing type definition or null indicating an untyped 
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pointer. TYPE^NAME is a varying string describing the type and may be null. LOCATOR is a GEM or foreign locator. 
LOCATOR may be the null locator. DECL.BLK is the block node in which the type Is defined. 



GEM^TD^SET^POINTER TYPE( 

POINTER_^TYPE value, 
NEW^TYPE value) 



10 Forthe existing pointer definition created by a call to GEM_TD_POINTER, redefine the type associated with the pointer 
POINTER_TYPE is a handle to the existing type node defined for a pointer. NEW_TYPE is the handle to a type node 
created for an existing type definition. 
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25 



30 



Definition Of Ranges. Enumerations. And Sets 

The GEM_TD_DEF_RANGE, GEM_TD_DE PLENUM. GEM_TD_SET_ENUM_ELEMENT and 
GEM_TD_DEF_SET define ranges, enumerations, enumeration elements and sets over defined types. 

TYPE_NODE = 

GEM_TD_DEF_RANGE( 

GEM_BLOCK_NODE, 



GEM.LITERAL_NODE, 
GEM^UTERAL.NODE) 



DECL_BLK 


in_out 




LOCATOR 


value, 




TYPE_NAME 




in VS. 


RANGE_TYPE 


value. 




RANGE_LOW_VAL 




in 


RANGE_HIGH_VAL 




in 



Define a range type. The range is defined by its underlying type, RANGE_TYPE, and the low and high values of the 
3S range, as Indicated by the literal nodes RANGE_LOW„VAL and RANGE_HIGH_VAL DECL_BLK is the block node in 
wh»h the type is defined. LOCATOR is a GEM or foreign locator LOCATOR may be the null locator. TYPE^NAME is 
a varying string describing the type and may be null. RANGE_TYPE is a handle to a type node of an existing basic 
type definitfon. RANGE_LOW_VAL and RANGE_HIGH_VAL are pointers to literal nodes indicating the low and high 
values In a range. 



TYPE^NODE = 

GEM_TD_DEF_ENUM( 
^ DECL.BLK in^out 

GEM.BLOCK.NODE, 

LOCATOR : value, 

TYPE_NAME in VS.STR, 

ENUM^TYPE value) 



Define an enumeration. The enumeration's elements are defined by calls to the routine 
GEM_TD_SET_ENUM.ELEMENT DECL3LK is the block node in which the type is defined. LOCATOR is a GEM or 
foreign locator. LOCATOR may be the null locator. ENUM^TYPE Is a handle to a type node created for an existing 
^ basic type definition. 

A front end must apply enumeration elements to the enumeration definitkm in first to last order. 



68 



EP0529049B1 



10 



IS 



20 



35 



45 



TYPE^NODE = 

GEM_TD.SET_ENUM^ELEMENT( 

ENUM.TYPE value, 
LOCATOR value, 
ENUM^ELEMENT^NAME : in VS.STR. 



ENUM^ELEMENT^VALUE in 

GEM.LITERAL^NODE) 

Define for an enumeration Indicated by the type node handle ENUM_TYPE, an element named 
ENUM^ELEMENT^NAME With a value ENUM.ELEMENT^VALUE. ENUM_TYPE is a handle to an existing type node 
for an enumeration. LOCATOR is a GEM or foreign locator LOCATOR may be the null locator. 
ENUM_ELEMENT_NAME is a varying string defining the enumeration element 



ENUM_ELEMENT_VALUE is a literal node defining the element's value 
GEM_.TD^SET_SEL 

TYPE^NODE = 

GEM.TD_DEF_SET( 
^ DECL.BLK in.out 

GEM_BLOCK^NODE, 

LOCATOR value. 
TYPE^NAME 
30 SET TYPE 



in VS^STR, 

value) 



Defines a set of the type defined by the type node handle SET_TYPE. DECL_BLK is the block node in which the type 
is defined. LOCATOR is a GEM or foreign locator. LOCATOR may be the null locator, TYPE_NAME is a varying string 
describing the type and may be null. SET-TYPE may be a handle returned by: 



O GEM_TD_DEF_BASIC_TYPE 
O GEM_TD_DEF_CHAR_TrPE 
40 O GEM_TD_DEF_ENUM 
O GEM_TDJ)EF_RANGE 
O GEM_TD_TYPEDEF 
Definition Of Arravs 



The routines GEM_TD_DEF_ARRAY and GEM_TD3ET_ARRAY_BOUNDS may be used to define arrays and 
the bounds of array dimensions. The bounds of array dimensions may be defined as being fixed, adjustable, or as- 
so sumed. 

TYPE^NODE = 

^ GEM_TD_DEF.ARRAY( 
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DECL.BLK 

GEM^LOCK^NODE. 

LOCATOR 
TYPE_NAME 

VS^STR. 

ARRAY.ELEMENT_TYPE : 
ARRAY_DIM^COUNT 



Define an array of type ARRAY_ELEMENT_TYPE. DECLJLK Is the block node in which the type is declared. LOCA- 
TOR Is a GEM or foreign locator. LOCATOR may be the null locator T/PE^NAME is a varying string describing the 
type and may be null. ARfW_ELEMENT_TYPE is a handle to the type node defining the type of the array elements. 
IS ARRAY_DIM__COUNT In the number of dimensions for the array 

Note that the dimension count is transmitted as a value rather than a literal node. 

The bounds of an array's dimensions are specified by means of the GEM_TD_SET_ARRAY_BOUNDS routine. 

20 GEM_TD_SET_ARRAY30UNDS( 

ARRAY TYPE 
LOCATOR 

array^dim 

25 dim_l6w3ound 

GEM.NODE. 

DIM_HIGH30UND 
DIM^INDEX^TYPE 
DIM^STRIDE 

GEM_LrrERAL_NODE) 

For the array type definition, specified by the handle ARRAY_TYPE, set the bounds of the dimension indicated by 
ARRAYED! M. LOCATOR Is a GEM or foreign locator. LOCATOR may be the null locator. 
35 DIMJNDEX_LOW and DIMJNDEX^HIGH define the lower and upper bounds of the dimension. DIMJNDEX^TYPE 
is a handle to the type node defining the type used to index the array dimension. DIM^STRIDE defines the size, in 
bytes between succeeding elements of the dimension being defined, .blank A constant upper or lower bound is specified 
by a literal node. Nonconstant bounds are indicated by symbol nodes that define the location of bounds values. 

^ Definition Of Structures. Variants And Unions 

The following routines are used to define structures, including variants, and unions. A structure, which may have 
variant components, is defined by calls to the following routines: 

45 O GEM_TD_DEF_STRUCT 

O GEM_TD_SET„STRUCT_ELEMENT 

O GEM_TD_STRUCT_SELECTOR 

50 

O GEM_TD_DEF_STRUCT_VARIANT 

O GEM_TD_SET_SELECTOR_RANGE 

55 O GEM_TD_SET_SELECTOR_DEFAULT 

O GEM_TD_DEF_UNION 



in_our 
value, 

in 

value, 
value) 



value, 
value, 

value, 

: in 

in GEM^NODE. 

value, 

in 
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O GEM_TD^SET_UNION_MEMBER 



10 



TYPE^NODE = 

GEM_TD_DEF_STRUCt( 

DECLJLK in_out 

GEMJLOCK.NODE, 

LOCATOR value, 
TYPE^NAME in VS.STR, 

STRUCTURE_SIZE : value) 



IS 



Define a structure or record. DECL^BLK is the block node in which the structure is declared. LOCATOR is a GEM or 
foreign locator. LOCATOR may be the null locator. TYPE_NAME is a varying string describing the type and may be 
null. STRUCTURE_SIZE is the size of the structure in bytes. 



20 



25 



GEM_TD_SET^STRUCT_ELEMENT( 
STRUCT_TYPE 
VARIANT^PARENT 
LOCATOR 
ELEMENT_NAME 
ELEMENT^TYPE 
ELEMENT^LOC.BYTE 

GEM^UTERAL^NODE, 



value, 

value, 

value. 

in VS^STR, 

value, 

in 



30 



35 



GEM_UTERAL_NODE, 
GEM^LITERAL.NODE) 



ELEMENT_LOC_BIT 
ELEMENT SIZE 



m 



m 



40 



4S 



so 



Define an element of the structure defined by the structure definition handle STRUCT_TYPE. The element is named 
ELEMENT.NAME and has a type defined by the type node handle ELEMENT^TYPE. VARIANT^PARENT is the im- 
mediate parent variant of the element or null if the element does not define a member of a variant. LOCATOR is a GEM 
or foreign locator. LOCATOR may be the null locator. Its location is relative to the root of the structure being defined 
and is specified by ELEMENT_LOC_BVTE and ELEMENT_LOC_BIT 

The size of the stmcture element is specified, in bits, by ELEMENT^SIZE. ELEMENT_SIZE is specified to support 
definition of the struct elements c1 and c2 in the folbwing C program fragment. 



ss 
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rypedef struct ml { 
char cl : 4; 
char c2 : 4; 

1; 



10 



IS 



20 



2S 



TYPE NODE = 



GEM_LrrERAL_NODE, 
GEM^LTTERAL^NODE, 
GEM^UTERAL^NODE) 



GEM_TD^DEF_STRUCT^SELECrOR( 
STRUCT^TYPE 
VARIANT_PARENT 
LOCATOR 
ELEMENT_NAME 
ELEMENT_TYPE 
ELEMENT^LOC^BYTE 

ELEMENT^LOCjrr 

ELEMENT SIZE 



value, 

value, 

value, 

in VS_STR, 

value, 

in 

in 

in 



30 



Define a selector for variant components of a record. A selector is a structure element which determines the variant 
of a structure. The selector element is named ELEMENT_NAME and has a type defined by the type node handle 
ELEMENTJTYPE, VARIANT^PARENT is the immediate parent variant of the selector element or null if the element 
does is not a member of a variant. LOCATOR is a GEM or foreign locator. LOCATOR may be the null locator. Its location 
is relative to the root of the structure being defined and is specified by ELEMENT_LOC_BYTE and 
ELEMENT_LOC_BIT The size of the stnjcture element is specified, in bits, by ELEMENT.SIZE. 



TYPE_NODE = 

GEM_TD_DEF_STRUCT_VARIANT( 

STOUCT.TYPE value, 
LOCATOR value) 

Define a variant of a structure. SELECTOR_TYPE is the type node that selects the variant. LOCATOR is a GEM or 
foreign locator. LOCATOR may be the null locator. The values of the selector that select the variant are specified by 
the GEM_TD_SET.SELECTOR_RANGE and GEM_TD_SET_SELECTOR„DEFAULT routines. 

GEM_TD„SEr„SELECTOR,RANGE( 

VARIANT_^TYPE : • value. 

LOCATOR value, 
RANGE^LOWER_BOUND : in 

GEM_LITERAL_NODE, 

RANGE_UPPER30UND : in 

GEM.LITERAL.NODE) 

Define a selector range for the variant VARIANT_TYPE. LOCATOR is a GEM or foreign locator LOCATOR may be 
the null locator. When defining a single selector value RANGE_UPPER_BOUND should have the same value as 
RANGE_LOWER_BOUND. Combinations of selector single and range selectors may be applied to a variant. 
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GEM_TD_SET_SELECTOR^DEFAULT( 

VARIANT_TYPE value. 
LOCATOR value) 



Define a variant type VARIANT-TYPE as being the default variant when ail of the values of its selector have not been 
enumerated. LOCATOR is a GEM or foreign locator LOCATOR may be the null locator. When defining a scalar selector 
value RANGE_UPPER_BOUND should have the same value as RANGE_LOWER_BOUND. Combinations of selector 
scalars and ranges may be applied to a variant 



TYPE_NODE = 

GEM^TD_DEF_UNION( 

DECL^TYPE in^out 

GEM.BLOCK^NODE, 

LOCATOR value. 
TYPE_^NAME : in VS.STR, 

UNION^SIZE in 

GEM_LITERAL_NODE) 

Define a union. DECL_BLK is the block node in which the stmcture is declared TYPE_NAME is a varying string 
describing the type and may be null. LOCATOR is a GEM or foreign locator. LOCATOR may be the null locator. 
UNION^SIZE is the size of the structure In bytes. The members of a union are defined by calls to the routine 
GEM.TD_SET_UNION_MEMBER. 

GEM_,TD_SET_U]SnON_MEMBER( 

UNION^TYPE value, 

LOCATOR value, 

MEMBER^NAME : in VS^STR. 

MEMBER^TYPE value) 

Define a member of the union indicated by the type node UNION-TYPE. UNIONJTYPE is the type node of the union 
that contains the member. LOCATOR is a GEM orforeign locator. LOCATOR may be the null locator. MEMBER_NAME 
is a varying string defining the name of the member. MEMBER-TYPE Is the type node of the member being defined. 

Definition Of Function And Routine Parameters 



TYPE.NODE = 

GEM_TD^DEF_FUNCT[ON_TYPE( 
DECL.BLK 

GEM„BLOCK_NODE, 

LOCATOR 
TYPE.NAME 
FUNCTION TYPE 



in_out 
value, 

: in VS^STR, 
value) 



Define the type of a procedure parameter as being of the type specified by the type node FUNCTION^TYPE. Note 
that this Is not used to define the type of an entry symbol, rather it describes parameters to routines. DECL_BLK is the 
block node in which the type is defined. LOCATOR is a GEM of foreign locator. LOCATOR may be the null locator. 
TYPE_NAME is a varying string describing the type and may be null. 
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EXAMPLES 

The followiog examples describe a number of types and symbols and the mechanisms that would be used to 
describe them to GEM. Note that the Pascal type boolean is defined as an enumeration over the GEM type uint32 

Examples Of Basic Types 

main() ( 
int a; 

unsigned int ua; 
float x; 
double xx; 

char str| )="HeUo. worldVn"; 



TYPINT32 = GEM_TD_DEF_BASIC_TYPE(main_block. locator, 'int'. 

GEM_TYP_K_INT32); 
TYPUINT32 = GEM_TD_DEF_BASIC_TYPE(main_block, locator, 'unsigned int'. 

G04_TYP_K_UINT32); 

TYPREALF = GEM_TD_DEF_BASIC_TYPE(main_block, locator, 'float', 

GEM_TYP_K_REALF); 

TYPREALG = GEM_TD_DEF_BASIC_TYPE(majn_block, locator, 'double', 

GEM_TYP_K_REALG); 

TYPCHAR8 = GEM_TD_DEF_BASIC_TYPE(main_block. locator, 'char'. 

GEM_TYP_K_INT8); 

TYPSTRING = GEM_TD_DEF_STRING( 

main_block, locator, 
'string'. 

GEM_STRREP_K_ASCIZ, 
TYPCHAR8. 
litnode(len(str}) ); 

Example Definition Of Type Boolean 
procedure bt; 
boolean myflag; 



TYPUINT32 = GEM_TD_DEF_BASIC_TYPE(bt_block. locator, 'unsigned int', 

GEM_TYP_K_UINT32); 
TYPBOOL = GEM_TD_bEF_ENUM(bt_block. locator, 'boolean', TYPUINT32); 

•GEM_TD_SET_ENUM_ELEMENT(TYPBOOL, locator, 'false', litnode(val=0) )• 
GEM_TD_SET_ENUM_ELEMENT(TYPBOOL, locator, 'true'. litnode(val=l) ); 
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Examples Of Character And Bit Aggregates 



routine iestit(pannl, ...) = 
begin 

own status : bitvecior[15], 
flagbits : bitvector[8]; 

bind dbits = .paiml : bitvectorQ; 



TYPBrrSl = GEM^TD_DEF3rrSTRING(testit31ock. locator, 'bitvector'. 

Iitnode(val=0), litnode(val=14) ); 
TYPBrrS2 = GEM,TD_DEFjrrSTRING(tcstit^block. locator, 'bitvector', 

litnode(val=0), limode(val=7) ); 
TYPBrrS3 = GEM^TD^DEF.BrrSTRING(testit31ock, locator, 'bitvector', 

litnode(val=0), litnode(val=l) ); 

Examples Of Pointers And Tvpedefs 

int echoO i 
struct mode { 



) 

typedef struct mode ssval; 
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mode *tp; 
znode *zp; 

struct znodse ( 

) 

TYPSTRUCTl = definition of strucnuc mode 
! Define ssval as alias for tnode. 

TYPALIAS = GEM_TD^DEF_TYPEDEF(echo_block, locator, 'ssval', TYPSTRUCTl); 

TYPPTRl = GEM_TD_DEF^POINTER(echo_block. locator, 'nuU'. TYPSTRUCTl); 

! Define an "anonymous" pointer, then structure znode. Finally modify 
! the pointer type. 

TYPPTR2 = GEM_TD_DEF^POINTER(echo31ock:. locator, 'pointer', nuU); 
TYPSTRUCT2 = definition of structure znode 
GEM_TD_DEF_POINTER_TYPE(TYPPTR2. TYPSTRUCT2); 
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Examples Of Ranges Enumerations And Sete 



void myprocO I 



type 

dnl =0..6; 

dn2 = 100.. 105; 

dn3 = 66000..66001; 

weekday = (mon,tue,wed,thu,fri); 
t_typ = (int^e,boo); 

var 

si : set of dnl; 
s2 : set of weekday; 
s3 : set of t_typ; 

! Define range dnl 

TYPUINT8 s= GEM_TD_DEF_BASIC_TyPE(inyproc_block, locator. nuU. 

GEM_TYP_K_UINT8); 
TYPRANGEl = GEM_TD_DEF_RANGE(myproc_block, locator, 'dnl'. 

TYPUINT8, Utnode(val=0), Iitnode(val=6)); 

! Define range dn2. 

TYPRANGE2 = GEM_TD_DEF_RANGE(myprDC_block, locator. 'dn2'. 

TYPUINT8, litnode(val=l00), litnode(val=105)); 

! Define range dn3. 

TYPINT32 = GEM_TD_DEF_BASIC_TYPE(myproc_block. locator, nuU, 

GEM_TYP_K_UINT32): 
TYPRANGE = GEM_TD_DEF_RANGE(myproc_block, TYPINT32. 'dnS', 

litnode(val=66000), limode(val=66001) ); 

TYPEhnjMl = GEM_TD_DEF_ENUM(myproc_block. locator, 'weekday', TYPUINT8); 
GEM_TD_SET_E^fUM_ELEMENT(TYPENUMl. locator, 'mon', Utnode(val=0) ); 
GEM_TD_SET_ENUM_ELEMENT(TYPENUM1, locator, 'tue', litnode(vaI=l) ); 
GEM_TD_SET_ENUM_ELEME^^•(TYPENUM1, locator, 'wed*. Utnode(val=2) ); 
GEM_TD_SEr_ENUM_EL£NIENT(TYPENlJMl, locator. *thu'. litnode(val=3) ); 
GEM_TD_SET_ENUM_ELEMENT(TYPENIJM1, locator, 'fri', litnode(val=4) ); 
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TYPENUM2 = GEM_TD-DEF_Ehn;M(niyproc_block, locator, t.typ', TYPEUINT32): 
GEM_TD_SET_ENUM_ELEMENT(TYPENUM2, locator, int'. Iitnode(val=0) ); 
GEM_TD_SET_ENUM_ELEMENT(TYPENUM2, locator, 're'. litnode(val=l) ); 
GEM_TD_SET_ENUM_ELEMENT(TYPENUM2. locator, 'boo', Utnode(val=2) ); 

! Define the sets for vars si, s2 AND s2. 

TYPSETl = GEM_TD-DEF_SEr(myproc_block, locator, 'set', TYPRANGEl); 
TYPSET2 = GEM_TD-DEF_SEr(myproc_block, locator, 'set', TYPENUMl); 
TYPSET3 = GEM_TD-DEF_SET(myproc_block. locator, 'set', TYPENUM2); 



Examples Of Arrays 

procedure dimmer; 
type 

nd = record 

var 

aryl : arTay[1..10] of integer, 
ary2 : anay[1..10.100..110] of integer, 
ary3 : array[900..1700] of nd; 
ary4 : array['a'..'z'] of nd; 

TYPSTRUCTl = Definition of ifccord type nd. 

! Define anray 'aiyl*. 

TYPINT32 = GEM_TD_DEF_BASIC_TYPE(dimmer_block. locator, null, 

GEM_TYP_K_INT32); 

TYPARRAY = GEM_TD_DEF_ARRAY(dinimer_block, locator, nuU, TYPINT32, 1); 

GEM_TD_SET_ARRAY_BOUNDS(TYPARRAY, locator, 1, 

limode(val=l}, limode(val=10), 
TYPINT32, limode(value=4) ); 

! Define array 'ary2'. 

TYPARRAY = GEM_TD_DEF_ARRAY(dimmer_block, locator, null, TYPINT32, 2); 

GEM_TD_SEr_ARRAY_BOUNDS(TYPARRAY, locator, 1. 

litnode(val=l), litnode(val=lO), 
TYPINT32, litnode(value=4) ); 

GEM_TD_SET_ARRAY_BOUNDS(TYPARRAY, locator, 2. 

litnode(val=100). litnode(val=l 10), 
TYPINT32, limode(value=40) ); 
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.' Alternatively, the array specificatirai for ary2 may be defined as: 

TYPARRAYl = GEM_TD_DEF_ARRAY(dimmer_block. locator, null, TYPINT32, 1); 

GEM_TD_SET_ARRAY_B01)NDS(TYPARRAY1, locator, 1, 

litnode(val=100), litnodc(val=110), 
TYPINT32, litnode(value=4) ); 

TYPARRAY2 = GEM_TD_DEF_ARJRAY(diininer_block, locator, null, TYPARRAYl. 1); 

GEM_TD_SET_ARRAY_B0UNDS(TYPARRAY2, locator. 1, 

litnode(val=l), litnode(val=10), 
TYPINT32, litnode(value=40) ); 



• ! Define anay 'ary3'. 

TYPARRAY= GEM_TD_DEF_ARRAY(diniiner_block, locator, null, TYPSTRUCTl, 1); 
GEM_TD_SET_ARRAY_BOUNDS(TYPARY. locator, 1, 

litnode(val=900), litnode(val=1700). 

TYPINT32, sizeof(nd) ); 
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Examples Of Adjustable Array Definition 



subroutine x(cv,aryl,ary2,a,b) 

character*(*) cv 
dimension aryl(l:10,l:b) 
dimension ajry2(a:b,l:*) 

TYPINT32 = GEM_TD_DEF_BASIC_TYPE(x block, locator, null, GEM_TYP_K_INT32) 
TYPCHAR = GEM_TD_DEF_CHAR_TYPE(x block, locator, null, GEM_TYP_K_INT8); ' 

! Define array 'cv'. 

TYPARRAY = GEM_TD_DEF_ARRAY(x_block. locator. nuU, TYPCHAR, 1); 
GEM_TD_SEr_ARRAY_BOUNDS(TYPARRAY, locator, 1, 

litnode(val=l), litnode(val=l), 

TYP1NT32, litnode(vaI=I) ); 

! Define array 'aryl'. 

TYPREALF = GEM_TD_DEF_BASIC_TYPE(x block, locator, nuU, 
GEM_TYP_K_REALF); 

TYPARRAY = GEM_TD_DEF_ARRAY(x_block. locator, TYPREALF. 2); 

2, litiiode(val=4) ); 
GEM_TD_SET_ARRAY_BOUNDS(TYPARRAY, 1, locator. 

litnode(val=l). Utnode(val=10). 

TYPINT32, litnode(val=4) ); 
GEM_TD_SEr_ARRAY_BOUNDS(TYPARRAY. 2, locator, 

litnode(val=l), b_symbol, 

TYPINT32, litnode(val=4) ); *********** 

! Define array 'ary2*. 

TYPARRAY = GEM_TD_DEF_ARRAY(x block, locator, nuU, TYPREALF, 

TYPINT32. 2. Umode(val=4) ) 
GEM_TD_SET_ARRAY_B01INDS(TYPARRAY. locator. 1, 



a_symbol, b_symbol, 
TYPINT32, lxtnode{val=4) ); 
GEM_TD_SET_ARRAY_BOUNDS(TYPARRAY. locator, 2, 

litnode(val=l), litnodc(val=l), 
TYPINT32, litnode(val=4) ); 
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type 

t_typ = (it, re. pir. vl, v2, v3); 
ndp = @ncl; 
nd = record 

nxt : ndp; 

case tt : t_typ of 

it : (iv : iateger); 

re : (rv : real); 

prr : (pv : ndp; sum : integer); 

otherwise : (il : integer; i2 ; real); 

end; 

! DeSne basic types used in example. 

TYPINT32 = GEM^TD^DEF3ASIC_TYPE(typeit_block, locator, 

'integer*, 

GEM_TYP^K_INT32); 
TYPREALF = GEM^TD.DEF3ASIC^TYPE(typeit.block, locator. 

'real', 

GEM_TYP,K_REALF); 
TYPNIL = GEM_TD.DEF_BASIC.TYPE(typeit_block, locator, nuU, 

GEM^TYP^K^NIL); 

! Define ndp pointer to nd. 

TYPPTR = GEM^TD_DEF^FOINTER(typeit_block, locator, 'ndp'. TYPNIL); 
! Define the t_typ enumeration. 

TYPENUM = GEM_TD_DEF_^EhfUM(myproc_node. locator, 't_typ', TYPINT32); 

GEM_TD_SET_ENIJM3LEMEOT(TYPEN^ locator, 'it', litnode(vaI=K)) ); 
GEM_TD_SET.EIWM3LEMEhrr(TYPENIJM. locator, 're'. litnode(val=l) ); 
GEM_TD^SEr^E>njM_ELEMENT(TYPENUM, locator, 'boo\ litnode(val=2) ); 
GEM^TD_SET_ENUM^ELEMEhrr(TYPENUM, lo^^ 'vF, limode(val=3) ); 
GEM_TD„SET.ENUM_ELEMENT(TYPENUM, locator. 'v2'. limode(val=4) ); 



81 



EPO 529049 B1 



GEM_TD_SET_£NUM_ELEMENT(TYPENUM. locator. 'v3*. limode(val=5) ); 
! Define the structtue definition nd. 

TYPSTRUCT = GEM_TD_DEF_STRUCT(typeit_block, locator, 'nd*. 
litnode(nd_size)); 

GEM_TD_SET_STRUCT_ELEMENT(TYPSTRUCT. nuU. locator, 'nxf. TYPPTR. 
litnode(l_byte(nxt)), litnode(l_bit(nxt)), litnode(bit_size(nxt)) ); 

! Define the selector for variant parts. 

TYPSEL = GEM TD_DEF_STRUCT_SELECTOR(TYPSTRUCT, null, 'tt', TYPENUM. 
litnode(l_byte(ti)), limode(l_bit(n)). litnode(bit_size(tt)) ); 

! Define the variants of the structure including a default. 

VI = GEM_TD_DEF_STRUCT_VARIANT(TYPSEL. locator); 

GEM_TD_SET_SELECT0R_RANGE(V1, locator, Utnode(val=0), 
litnode(val=0); 

GEM_TD_SET_STRUCT_ELEMENT(TYPSTRUCr, VI. locator, 'iv'. TYPINT. 
litnode{l_byte(iv)). litnode(l_bit(iv). liinode(bit_size)(iv)) ); 

V2 = GEM_TD_DEF_STRUCT_VARIANT(TYPSEL. locator); 

GEM_TD_SET_SELECTOR_RANGE(V2, locator. litnode(val=l), 
limode(vaI=l); 

GEM_TD_SET_STRUCr_ELEMENT(TYPSTRUCT, V2, locator, 'rv', 
TYPREALF, 

lilnode(l_byte(rv)), litnode(l_bit(rv). litno(te(bit_size)(rv)) ); 

V3 = GEM_TD_DEF_STRUCr_VARIANT(TYPSEL. locator); 

GEM_TD_SET_SELECTOR_RANGE(V3, locator, litnode(val=2), 

litnode(val=2); 

GEM_TD_SET_STRUCT_ELEMENT(TYPSTRUCT, V3, locator, 'pv', TYPPTR. 
litnode(l_byic(pv)), litnode(l_bit(pv), litnode(bit_size)(pv)) ); 
GEM_TD_SET_STRUCT_ELEMENT(TYPSTRUCT. V3. locator, 'sum', TYPPTR. 
litnode(l_byte(sum)), litnode(l_bit(sujn). litnode(bit_size) 
(sum)) ); 

V4 = GEM_TD_DEF_STRUCT_VARIANT(TYPSEL. locator); 
GEM_TD_SET_SELECTOR_RANGE(V4, locator); 

GEM_TD_SEr_STRUCr_ELEMENT(TYPSTRUCT, V4, locator, 'il'. TYPINT. 
litnode(l_byte(il)). litnode(l_bit(il). Iitnode(bit_size)(il)) ); 
GEM_TD_SET_STRUCr_ELEMENT(TYPSTRUCT, V4, locator. 'i2'. TYPINT. 
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litnode(l_byte(i2)), limode(l.bit(i2). litnode(bit_size)(i2)) ); 
GEM_TD„SET_POINTER_TYPE(TYPPTR, TYPSTRUCT); 

Examples Of StrucTures And Union Definition 

mainO I 

struct dim3 { 
int x; 
int y; 
int z; 

); 

union anon { 
int ival; 
float fval; 
char *pval; 
stma dini3 loc; 

I; 

struct nl ( 

union anon a; 
union anon b; 
union anon c; 

); 

struct nl nll^24iB; 

TYPINT32 = GEM^TD_DEFJASIC^TYPE(main31ock, locator, 'int', 

GEM_TYP^K_INT32); 
TYPREALF = GEM^TD.DEF3ASIC_TYPE(main31ock. locator, null, 

GEM^TYPE^K^REALF); 
TYPCHAR = GEM^TD^DEF^CHAR_TYPE{main_block, locator, null, 

GEM^TYP^K^Uim^); 
TYPPTR = GEM_TD_DEF_POINTCR(main^block, locator, null, TYPCHAR); 

! Define strucnire *dim3'. 

TYPSTRUCT = GEM^TD^DEF^STRUCT(main31ock, locator, 'dimS', 
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litnode(dini3_si2e) ); 
GEM_TD_SET_STRUCT_ELEMENT(TYPSTRUCT. null, locator. x\ TYPINT32. 

loc_byte(x), loc_bit(x)» liinode(x_size)); 
GEM^TD^SET.STRUCT_ELEMENT(TYPSTRUCT, null, locator, 'y*. TYPINT32, 

loc_byte(y), loc_bir(y), limode(y_size)); 
GEM_TD_SET_STRUCT^ELEMENT(TYPSTRUCT. null, locator, 'z\ TYPINT32, 

loc_byte(z), loc_bit(z), litnode(2_si2e)); 

! Define the union 'anon*. 

TYPUNION = GEM_TD_DEF_UNION(main_biock, locator, 'anon', 
litnode(anon_size) ); 
GEM_TD^SET_UMON_MENIBER(TYPUNION, locator, 'ival/, TyPINT32); 
GEM_TD_SET_UNION.MEMBER(TYPUNION, locator, 'fval\ TYPREALF); 
GEM_TD_SET_UNION_MEMBER(TYPUN10N, locator, 'pval'. TYPPTR); 
GEM^TD_SET_lJMON_MEMBER(TYPUNION, locator, 'loc*. TYPSTRUCT); 

! Define the stnicture 'nl\ 

TYPSTRUCT = GEM_TD_DEF_STRUCT(main31ock, locator, 'nl-, 
limode(nl_size)); 

GEM^TD_SET_STRUCT^ELEMENT(TYPSTRUCT. nuU. locator, 'a\ TYPUNION. 

loc_byic(a), loc_bit(a), 
litnode(anon_size) ); 

GEM_TD.SET.STRUCT^ELEMENT(TYPSTRUCT. null, locator. 'h\ TYPUNION, 

loc_bytc(b), loc_bit(b), 
litnode(anoh_size) ); 

GEM_TD^SET_STRUCT.ELEMENT(TYPSTRUCT. nuU. locator. 'c\ TYPUNION. 

loc_byte(c), Ioc_bit(c), 
litnode(anon_size) ); 
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Examples Of Function Parameter Definition 



function x (function grin : real; 

procedure bearit) : integer; 



TYPREALF = GEM_TD.DEFJASIC_TYPE(x_block, locator, 'real', 

GEM_TYF.K_REALF); 
TYPNIL = GEM^TD_DEF3ASIC^TYPE(x_block, locator, null. 

GEM^TYP^K^NIL); 

! Define type for funaion parameter *grin\ 

TYPPROC = GEM.TD.DEF_FUNCnON_TYPE(x_bIock, locator, null, TYPREALF); 
! Define type for procedure parameter 'bearit'. 

TYPFUNCT = GEM^TD_DEF.FUNCnON.TYPE(x.block. locator. nuU. TYPNIL); 



Claims 

1. A method of performing code generation executed in a computer system comprising tfie steps of: 

matching a portion of an intermediate language graph (31 -34) with a code template, said portion of said inter- 
mediate language graph comprising a plurality of tuples (35) and being a representation of a first source 
program (21 ) in an intermediate language, each tuple representing a single operation to be performed in said 
first source program, said code template including a predetermined intermediate language pattern which cor- 
responds to said portion of said intermediate language graph to guide in generating a part of a first object 
mo6u\e that corresponds to said first source program; 

analyzing said portion of said intermediate language graph to determine an order of evaluation of an expression 
in said portion of said intermediate language graph using actions which are included in said code template 
and which indicate said order of evaluation; 

allocating, using said actions in accordance with said order of evaluation, a temporary name for a variable 
and assigning an allocation lifetime to said temporary name, said allocation lifetime indicating the extent to 
which the temporary name and the storage of the ternporary name are associated with said variable in said 
intermediate language graph; 

updating at least one of said plurality of tuples in said portion of said intermediate language graph as needed 
to perform said analyzing and said allocating steps as indicated by said actions; and 
generating machine language instructions that are included in said part of said object module (23) by using 
said actions and said intermediate language graph having at least one updated tuple. 

2. The method of claim 1 . wherein said temporary name is one of a local temporary or non-iocal temporary name, 
said local temporary name being a variable allocatbn with a lifetime limited to said portion of said intermediate 
language graph, said non-local temporary name being a variable allocation with a lifetime that extends beyond 
said portion of said intermediate language graph. 

3. The method of claim 2, wherein said matching step is performed during a first pass over said portion of said 
intermediate language graph, said analyzing step and said allocating step for allocating non-local temporary names 
are performed during a second pass over said portion of said intermediate language graph, said portion being 
updated and producing a nrxxJified portion of said intermediate language graph, said allocating step for allocating 
local temporary names is performed during a third pass using said modified portipn of said intermediate language 
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graph, said modified portion being updated and producing another modified portion of said intennnediate language, 
and said generating step is performed during a fourth pass over sard other modified portion of said intermediate 
language graph. 

4. The method of claim 3, wherein each of said second, said third, and said fourth passes is performed by an action 
interpreter that uniquely processes a part of said actions for said each pass. 

5. The method of claim 3, wherein said actions include one or more undelayed actions and one or more delayed 
actions, said undelayed actions being performed prior to said delayed actions for each of said second, said third, 
and said fourth passes and producing a processing result which is used in one of said delayed actions. 

6. The method of claim 1 . wherein one of said actions indicates that said variable associated with said temporary 
name is stored In a memory. 

7. The method of claim 1, wherein one of said actions indicates that said variable associated with said temporary 
name is stored In a register. 

8. The method of claim 1 , wherein said predetermined language pattern includes a result value mode, a pattern tree, 
a sequence of boolean tests, and a cost of the code generated by said code template, said result value rrrode 
indicating the result value computed by said portion of said first object nrK)dule, said pattern tree describing oper- 
ators and operands of said predetermined intermediate language pattern, said sequence of boolean tests repre- 
senting statements about said portion of said Intermediate language graph that must be true for said predetermined 
intermediate language pattern to be matched to said portion, said cost being represented as an integer and Indi- 
cating a performance cost associated with said portion of generated code. 

9. The method of claim 8, wherein said matching step includes comparing a cost associated with said code template 
and another cost associated with another code template, and choosing one of said code template and said other 
code template with the minimum associated cost 

10. The method of claim 1 , wherein said first object module is for a first target computer system and wherein the 
method of claim 15 is used to produce a second object module for another target computer system, said second 
object nrKxIule corresponding to a second source program different from said first source program. 

11. A compiler for use In a computer to perform code generation during the compiling of a source program to produce 
a corresponding object module, the compiler comprising: 

a code comparator which matches a portion of an intermediate language graph (55) with a code template, 
said Intermediate language graph comprising tuples (35) and being a representation of a first source program 
(21 ) In an intermediate language, each tuple representing a single operation to be performed In said first source 
program, said code template including a predetermined intermediate language pattem which corresponds to 
said portion of said intennediate language graph to guide in generating a part of a first object nrKxJule that 
corresponds to said first source program; 

a code analyzer for analyzing said portion of said intenmediate language graph to determine an order of eval- 
uation of an expression In said portion of said intermediate language graph using actions which are Included 
in said code template and which indicate said order of evaluation; 

an allocator that uses said actions in accordance with said order of evaluation to allocate a temporary name 
for a variable and assigning an allocation lifetime to said temporary name, said allocation lifetime indicating 
the extent to which the temporary name and the storage of the temporary name are associated with said 
variable In said intermediate language graph; 

means for updating a tuple in said portion of said intennediate language graph as needed by the analyzer and 
the allocator to perform, respectfully, the analyzing and the allocating as Indicated by said actions; and 
means for generating machine language instructions that are included in said part of said object module (23) 
by using said actions and said intermediate language graph having updated tuples. 

12. The apparatus of claim 11, wherein said allocator further includes a local temporary name allocator and a non- 
local temporary name allocator, said local temporary name allocator allocating variables with a lifetime limited to 
said portion of said intermediate language graph, said non-local temporary name allocator allocating variables 
with a lifetime that extends beyond said portion of said intermediate language graph. 
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13. The apparatus of claim 11 , wherein said predetermined language pattern includes a result value mode, a pattern 
tree, a sequence of boolean tests, and a cost of the code generated by said code template, said result value mode 
Indicating the result value computed by said portion of said first object module, said pattern tree describing oper- 
ators and operands of said predetermined intermediate language pattern, said sequence of boolean tests repre- 
senting statements about said portion of said intermediate language graph that must be true for said predetermined 
intermediate language pattern to be matched to said portion, said cost being represented as an integer and indi- 
cating a performance cost associated with said portion of generated code. 

14. The apparatus of claim 11 , wherein said first object module is for a first target computer system and wherein the 
apparatus of claim 1 1 1s used to produce a second object module for another target computer system, said second 
object module corresponding to a second source program different from said first source program. 



PatentansprQche 

1 . Verfahren zum Ausf uhren einer Codeerzeugung. das in einem Computersystem ausgef uhrt wird, mit den folgenden 
Schritten: 

Anpassen eines Abschnitts eines Zwischensprachengraphen (31 -34) an eine Codeschablone, wobei der Ab- 
schnitt des Zwiischensprachengraphen mehrere Tupel (35) enthalt und eine Darstellung eines ersten Quell- 
programms (21 ) in einer Zwischensprache ist, wobei jedes Tupel eine einzelne Operation darstettt. die in dem 
ersten Quellprogramm auszuf Ohren ist. wobei die Codeschablone ein vorgegebenes Zwischensprachenmu- 
ster enthalt, das dem Abschnttt des Zwischensprachengraphen entspricht, um die Erzeugung eines Teils eines 
ersten Objektmoduls. das dem ersten Quellprogramm entspricht. zu lenken; 

Analysieren des Abschnitts des Zwischensprachengraphen, um eine Evaluierungsreihenfolge eines Aus- 
drucks In dem Abschnitt des Zwischensprachengraphen unter Verwendung von Aktionen zu bestimmen, die 
in der Codeschablone enthalten simJ und die die Evaluierungsreihenfolge angeben; 
Zuordnen eines vorubergehenden Namens an eine N^riable unter Verwendung der Akttonen in Uberelnstlm- 
mung mit der Evaluierungsreihenfolge und Zuwelsen einer Zuordnungslebensdauer an den vorubergehenden 
Namen, wobei die Zuordnungslebensdauer das Ausma3 anglbt. In dem der vorubergehende Name und die 
Speicherung des vorubergehenden Namens der N^riable In dem Zwischensprachengraphen zugeordnet sind; 
Aktualisieren wenigstens eines der mehreren Tupel in dem Abschnitt des Zwischensprachengraphen nach 
Bedarf . um die Analyse- und Zuordnungsschritte wie durch die Aktbnen angegeben auszufuhren; und 
Erzeugen von Maschinensprachenbefehlen, die in dem Tell des Objektmoduls (23) enthalten sind, unter Ver- 
wendung der Aktkxien und des Zwischensprachengraphen. der wenigstens ein aktualisiertes Tupel enthalt. 

2. Verfahren nach Anspruch 1 , bei dem der vorubergehende Name entweder ein lokaler vorubergehender Name 
Oder ein nicht bkaler vorubergehender Name ist, wobei der bkale vorubergehende Name eine N^rlablenzuordnung 
mit einer Lebensdauer ist. die auf den Abschnitt des Zwischensprachengraphen eingeschrankt ist, und wobei der 
nicht tokale vorubergehende Name eine N^riablenzuordnung mit einer Lebensdauer ist, die sich Qber den Abschnitt 
des Zwischensprachengraphen htnaus erstreckt. 

3. Verfahren nach Anspruch 2. bei dem der Anpassungsschritt wahrend eines ersten Du rchlauf s durch den Abschnitt 
des Zwischensprachengraphen ausgef uhrt wird, der Analyseschritt und der Zuordnungsschritt zum Zuordnen nicht 
tokaler vorubergehender Namen wahrend eines zwelten Durchlaufs durch den Abschnitt des Zwischensprachen- 
graphen ausgefOhrt werden, wot>ei der Abschnitt aktualisiert wird und einen abgewandelten Abschnitt des Zwi- 
schensprachengraphen erzeugt, der Zuordnungsschritt zum Zuordnen tokaler vorubergehender Namen wahrend 
eines dritten Durchlaufs unter Venwendung des abgewandelten Abschnitts des Zwischensprachengraphen aus- 
gefOhrt wird. wobei der abgewandette Abschnitt aktualisiert wird und einen weiteren abgewandelten Abschnitt der 
Zwischensprache erzeugt, und der Erzeugungsschrttt wahrend eines vierten Durchlaufs durch den weiteren ab- 
gewandelten Abschnitt des Zwischensprachengraphen ausgefuhit wird. 

4. Verfahren nach Anspruch 3, bei dem jeder der zweiten. dritten und vierten Durchlaufe durch einen Aktionsinter- 
pretierer ausgef uhrt wird, der ausschliedlich einen Teil der Aktionen fOr jeden der Durchlaufe verarbeitet. 

5. Verfahren nach Anspruch 3, bei dem die Aktionen eine oder mehrere nicht verzogerte Aktionen und eine Oder 
mehrere verzogerte Aktionen enthalten, wobei die nicht verzogerten Aktionen fur jeden der zweiten. dritten und 
vierten Durchlaufe vor den verzogerten Aktionen ausgef uh rt werden und ein Vdrarbeitungsergebnis erzeugen, das 
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in einer der verzogerten Aktionen verwendet wird. 

6. Verfahren nach Anspruch 1 , bei dem eine der Aktionen angibt. daB die dem vorubergehenden Namen zugeordnete 
>^riable in einem Speicher gespeichert ist. 

7. Verfahren nach Anspruch 1 , bei dem eine der Aktbnen angibt, daB die dem vorubergehenden Namen zugeordnete 
Vlariable in einem Register gespeichert ist. 

a Verfahren nach Anspruch 1, bei dem das vorgegebene Sprachmuster einen Ergebniswertmodus, einen l\^uster- 
baum, eine Folge von Boolschen Prut ungen sowie Kosten des von der Codeschabtone erzeugten Codes enthalt, 
wobei der Ergebniswertmodus den vom Abschnitl des ersten Objektmoduls berechneten Ergebniswert angibt. der 
Musterbaum Operatoren und Operanden des vorgegebenen Zwischensprachenniusters beschreibt, die Folge von 
Boolschen Prut ungen Aussagen Qber den Abschnitt des Zwischenbsprachengraphen darstellt. die fur das vorge- 
gebene Zwischensprachenmuster, das an den Abschnitt anzupassen ist, wahr sein mussen, und die Kosten durch 
eine ganze Zahl dargestellt sind und Betriebskosten angeben, die mit dem Abschnitt des erzeugten Codes in 
Beziehung stehen. 

9. Verfahren nach Anspruch 8. bei dem der Anpassungsschritt das Verglek:hen von Kosten, die mlt der Codescha- 
btone in Beziehung stehen, mit weiteren Kosten. die mit einer weiteren Codeschabtone in Beziehung stehen. sowie 
die V\tehl einer der Codeschabtonen und der anderen Codeschablone mil den minimaien zugehorigen Kosten 
enthalt. 

10. Verfahren nach Anspruch 1 , bei dem das erste Objektmodul einem ersten Ziel-Computersystem dient und bei dem 
das Verfahren nach Anspmch 15 verwendet wird. um ein zweites Objektmodul fur ein weiteres Ziel-Computersy- 
stem zu erzeugen. wobei das zweite Objektmodul einem zweiten Quellprogramm entspricht. das vom ersten Quell- 
programm verschieden Ist. 

1 1 . Compiler fur die Verwendung in einem Computer zum Ausf uhren einer Codeerzeugung wahrend des Kompilierens 
eines Quellprogramms, um ein entsprechendes Objektmodul zu erzeugen. wobei der Compiler enthalt 

einen Codekomparator, der einen Abschnitt eines Zwischensprachengraphen (55) an eine Codeschablone 
anpaBt, wobei der Zwischensprachengraph Tupel (35) enthalt und eine Darstellung eines ersten Quellpro- 
gramms (21) in einer Zwischensprache ist, wobei jedes Tupel eine einzelne Operation darstellt, die in dem 
ersten Quellprogramm auszufQhren ist, wobei dto Codeschablone eh vorgegebenes Zwischensprachenmu- 
ster enthalt. das dem Abschnitt des Zwischensprachengraphen entspricht, um die Erzeugung eines Teils eines 
ersten Objektmoduls, der dem ersten Quellprogramm entspricht. zu lenken; 

einen Codeanalysator zum Analysieren des Abschnitts des Zwischensprachengraphen, um eine Reihenfolge 
der Evaluiemng eines Ausdrucks in dem Abschnitt des Zwischensprachengraphen unter Vbnwendung von 
Akttonen zu bestimmen. die in der Codeschablone enthatten sind und die die Evalutemngsreihenfolge ange- 
ben; 

eine Zuordnungseinrichtung, die die Akttonen in Obereinstimmung mit der Evaluierungsreihenfolge venwendet, 
um einen vorubergehenden Namen an eine Variable zuzuordnen, und die dem vorubergehenden Namen eine 
Zuordnungslebensdauer zuweist. wobei die Zuordnungslebensdauer das AusmaB angibt. in dem der voruber- 
geherKle Name und die Spetoherung des vorubergehenden Namens der Variable in dem Zwischensprachen- 
graphen zugeordnet sind; 

eine EInrichtung zum Aktualisieren eines Tupels in dem Abschnitt des Zwischensprachengraphen nach Bedarf 
durch den Analysator und durch die Zuordnungseinrichtung. um die Analyse bzw. die Zuordnung wie durch 
die Aktionen angegeben auszufQhren; und 

eine Einrrchtung zum Erzeugen von Maschinensprachenbefehlen. die in dem Teil des Objektmoduls (23) ent- 
halten sind, indem sie die Akttonen und den Zwischensprachengraphen mit aktualisierten Tupein verwendet. 

12. Vorrichtung nach Anspmch 11. bei der die Zuordnungseinrichtung femer eine Zuordnungseinrichtung fflr einen 
lokalen vorubergehenden Namen und eine Zuordnungseinrichtung fur einen nicht tokalen vorubergehenden Na- 
men enthalt, wobei die Zuordnungseinrtohtung fOr den lokalen vorObergehenden Namen Nferiablen mit einer Le- 
bensdauer zuordnet, die auf den Abschnitl des Zwischensprachengraphen beschrankt sind, und wobei die Zuord- 
nungseinrfchtung fOr den nicht tokalen vorubergehenden Namen Nferlablen mit einer Lebensdauer zuordnet. die 
sich Qber den Abschnitt des Zwischensprachengraphen hinaus erstreckl. 
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13. Vorrichtung nach Anspruch 11 . bei der das vorgegebene Sprachmuster einen Ergebniswertmodus, einen Muster- 
baum, eine Folge von Boolschen Prufungen sowie Kosten des von der Codeschablone erzeugten Codes enthalt, 
wobei der Ergebniswertmodus den von dem Abschnitt des ersten Objektmoduls berechneten Ergebniswert angibt, 
der Musterbaum Operatoren und Operanden des vorgegebenen Zwischensprachenmusters beschreibt, die Folge 
s von Boolschen Prufungen Aussagen uber den Abschnitt des Zwischensprachengraphen darstellt, die fur das an 

den Abschnitt anzupassende vorgegebene Zwischensprachenmuster wahr sein mussen, und die Kosten durch 
eine ganze Zahl dargesteltt werden und Betriebskosten angeben, die mit dem Abschnitt des erzeugten Codes In 
Bezlehung stehen. 

10 14. Vorrichtung nach Anspruch ^^,be\ der das erste Objektmodul fur ein erstes Ziel-Computersystem vorgesehen ist 
und bei der die NA^rrichtung nach Anspruch 11 verwendet wird, um ein zweites ObjektnrKXiul fOr ein weiteres Ziel- 
Computersystem zu erzeugen, wobei das zweite Objektmodul einem zweiten Quellprogramm entspricht, das vom 
ersten Quellprogramm verschieden ist. 
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Revendications 

1. Proc6dd de gdn6ratk)n de code ex6cutd dans un systdme informatk^ue et comportant les dtapes conslstant: 

20 ^ comparer une partie d'un graphque de tangage interm^iaire (31-34) ^ un module de code, iadlte partie 

dudit graphique de langage intermediaire comportant une multiplbitd d'enregistrements relationnels (35) et 
dtant une reprdsentatkxi d'un premier programme source (21) dans un langage intemi6diaire. chaque enre- 
gistrement relationnei reprSsentant une seule operation k exdcuter dans ledit premier programme source, 
ledit module de code comprenant une configuratk>n de langage interm^iaire pr^dterminde qui correspond 

^ k iadite partie dudit graphique de langage intermediaire pour gukier la gindration d'une partie d'un premier 

module objet qui correspond audit premier programme source; 

k analyser Iadite partie dudit graphique de langage intermediaire afin de determiner un ordre d'dvaluatton 
d'une expression dans Iadite partie dudit graphique de langage Intermediaire en utilisant des actkxis qui sont 
comprises dans ledit nrxxJeie de code et qui indiquent ledit ordre d'evaluatk)n; 
30 ^ attribuer, en utilisant lesdites actions conform6ment audit ordre revaluation , un nom provisoire pour une 

variable et k affecter une duree de vie d'attribution audit nom provisoire, Iadite duree de vie d'attribution indi- 
quant la mesure dans laquelle le nom provisoire et la mise en memoire du nom provisoire sont associes k 
Iadite variable dans ledit graphique de langage intermediaire; 

k mettre k jour au moins un des enregistrements de Iadite multiplicite d'enregistrements relationnels dans 
3S Iadite partie dudit graphique de langage intermediaire comma requis pour executer lesdites etapes d'analyse 

et d'attribution comme indlque par lesdites actkxis; et 

a generer des Instructions en langage machine qui sont incluses dans Iadite partie dudit module objet (23) en 
utilisant lesdites actions et ledit graphque de langage intemnediaire ayant au moins un enregistrement rela- 
tkmnel mis k jour 



40 



45 



55 



Procede selon la revendication 1 , dans lequel ledit nom provisoire est soit un nom provisoire kx^l, soit un nom 
provisoire non local, ledit nom provisoire local etant une attrlbutkm variable ayant une duree de vie qui est iimitee 
k Iadite partie dudit graphique de langage intermediaire. ledit nom provisoire non local etant une attributkxi variable 
ayant une duree de vie qui s'etend au<ield de Iadite partie dudit graphique de langage intermediaire. 



3. Precede selon la revendication 2, dans lequel Iadite etape de comparaison est executee pendant une premiere 
passe sur Iadite partie dudit graphique de langage intermediaire, Iadite etape d'analyse et Iadite etape d'attribution 
pour attribuer des noms provisoires non locaux sont executees pendant une seconde passe sur Iadite partie dudit 
graphique de tangage intenmediaire. Iadite partie etant mise d jour et produisant une partie modlfiee dudit graphk|ue 
^ de langage intermediaire, Iadite etape d'attribution pour attribuer des noms provisoires locaux est executee pen- 

dant une troisieme passe en utilisant Iadite partie nrKxJrfiee dudit graphk^ue de langage intermediaire, Iadite partie 
modifiee etant mise k jour et produisant une autre partie modifiee dudit graphique de langage intermediaire, et 
Iadite etape de generatbn est executee pendant une quatrieme passe sur Iadite autre partie modifiee dudit gra- 
phique de langage intermediaire. 



Procede sekxi la revendk:ation 3. dans lequel chacune de Iadite seconde. de Iadite troisieme et de Iadite quatrieme 
passes est executee par un programme d'interpretatbn d'actbns qui traite de fagon unque une partie desdites 
actkxis pour Iadite chaque passe. 



89 



EP0529049B1 



10 



5. Proc6d6 selon la revendication 3, dans lequel tesdrtes actions comprennent une ou plusleurs actions non retard6es 
et une ou plusieurs actions retardSes, lesdites actions non retarddes 6tant exteutdes avant lesdites actions retar- 
d6es pour chacune de ladlte seconde, de ladite troisi^me et de ladite quatri&me passes et produlsant un rdsuftat 
de traitement qui est utitisd dans Tune desdites actbns retarddes. 

6. Proc6d6 selon la revendication 1 , dans lequel Tune desdites actions indique que ladite variable associee audit 
nom provisoire est emmagasinde dans une mdmolre. 

7. Proc^d selon la revendication 1, dans lequel Tune desdites actbns indique que ladite variable associee audit 
nom provisoire est emmagasind dans un registre. 



8. Procddd selon la revendication 1 , dans lequel ladite configuration de langage prMdtermin6e comprend un tnode 
valeur de rdsuttat, un arbre de configuration, une s^uence de controles bool6ens, et un cout du code genere par 
ledit moddle de code, ledit mode valeur de r^sultat indiquant la valeur de r^suttat calculi par ladite partie dudit 

1^ premier module objet. ledit arbre de configuration d^crivant des operateurs et des op6randes de ladite configuration 

de langage interm^diaire prdd^terminde, ladite sequence de contrdles booldens reprdsentant des affirmations 
relatives k ladite partie dudit graphique de langage intermddiaire qui doivent etre vraies pour que ladite configu- 
ration de langage intermddiaire pr6ddtermin6e soit compares k ladite partie, ledit coOt 6tant represents en tant 
que nombre entier et indiquant un coQt de performance associS ^ ladite partie de code g6n6r6e. 

20 

9. Proc^6 selon la revendication 8, dans (equel ladite etape de comparaison consiste k comparer un cout associe 
audit module de code et un autre coOt associd k un autre module de code, et k choisir celui dudit moddle de code 
et dudit autre rrKxIdle de code auquel le coQt minimal est associd. 

2S 10. Proc^d seton la revendication 1 , dans lequel ledit premier module objet est destine k un premier systeme infor- 
matique cible, et dans lequel le proc^6 selon la revendication 1 est utilise pour produire un second module objet 
destine k une autre systeme infonmatique cble, ledit second module objet correspondant k un second programme 
source different dudit premier programme source. 

30 11. Compilateur destine k dtre utilise dans un ordinateur pour etfectuer la generation de code pendant la compilation 
d'un programme source pour produire un module objet correspondant, le compilateur comportant: 

un comparateur de codes qui compare une partie d'un graphique de langage intermediaire (55) k un modeie 
de code, ledit graphique de langage intermediaire comportant des enregistrements relationnels (35) et etant 

3S une representation d'un premier programme source (21) dans un langage intenmediatre, chaque enregistre- 

ment relationnel representant une seuie operation k executer dans ledit premier programme source, ledit 
modeie de code comprenant une configuration de langage intermediaire predeterminee qui correspond k 
ladite partie dudit graphique de langage intermediaire pour guider la generation d'une partie d'un premier 
module objet qui correspond audit premier programme source; 

40 un anatyseur de codes pour analyser ladite partie dudit graphique de langage Intermediaire afin de determiner 

un ordre d'evaluation d'une expression dans ladite partie dudit graphique de langage intermediaire en utilisant 
des actions qui sont comprises dans ledit modeie de code et qui indiquent ledit ordre d'evaluation; 
un moyen d'attribution qui utilise lesdites actions conformement audit ordre d'evaluation pour attribuer un nom 
provisoire pour une variable et pour affecter une duree de vie d'attribution audit nom provisoire, ladite duree 

^ de vie d'attribution indiquant la mesu re dans laquelle le nom provisoire et la mise en memoire du nom provisoire 

sont associes k ladite variable dans ledit graphique de langage intermediaire; 

des moyens pour mettre k jour un enregistrement relationnel dans ladite partie dudit graphique de langage 
intermediaire comme requis par Tanalyseur et le moyen d'attribution pour executer. respectivement, I'analyse 
et I'attribution comme indique par lesdites actions; et 
^ des moyens pour generer des instructions en langage machine qui sont inctuses dans ladite partie dudit mo- 

dule objet (23) en utilisant lesdites actions et ledit graphique de langage intermediaire ayant des enregistre- 
ments relationnels mis k jour. 

12. Appareil selon la revendication 11 . dans lequel ledit moyen d'attributbn comprend, en outre, un moyen d'attribution 
55 de noms provisoires locaux et un moyen d'attribution de noms provisoires non beaux, ledit moyen d'attribution de 

noms provisoires beaux attribuant des variables ayant une duree de vie qui est limitee a ladite partie dudit gra- 
phique de langage intermediaire, ledit moyen d'attribution de noms provisoires non locaux attribuant des variables 
ayant une duree de vie qui s'etend au-del^ de ladite partie dudit graphique de langage intemiediaire. 
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13. Appareil selon la revendication 11 . dans lequel ladite configuration de langage pred6termin6e comprend un mode 
valeur de r6sultat. un arbre de configuration, una sequence de contrdles booldens, et un coOt du code gdndrd par 
(edit module de code, ledit mode valeur de rdsultat indiquant la valeur de r6sultat calcul^e par ladite partie dudit 
premier module objet, ledrt arbre de configuration ddcrivant des opdrateurs et des op6randes de ladite configuration 
de langage intermediaire pr6detenmin6e, tadite sequence de contrdles booleens repr^sentant des affirmations 
relatives ^ ladite partie dudit graphique de langage intermddiaire qui doivent etre vraies pour que ladite configu- 
ration de langage intermediaire prM6termin6e sort comparde k ladite partie, ledit coDt dtant represents en tant 
que nombre entier et indiquant un coQt de performance associ6 k ladite partie de code gen6r6e. 

14. Appareil selon la revendication 11 , dans lequel ledit premier module objet est destind k un premier systdme infor- 
matique cible. et dans lequel le precede selon la revendication 11 est utilise pour produire un second module objet 
destine k une autre syst^me Infonmatique cible, ledit second module objet correspondant k un second programme 
source different dudit premier programme source. 
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